Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Bio-inspired multisensory integration of social signals

Abstract

Emotions understanding represents a core aspect of human communication. Our social behaviours are closely linked to expressing our emotions and understanding others’ emotional and mental states through social signals. Emotions are expressed in a multisensory manner, where humans use social signals from different sensory modalities such as facial expression, vocal changes, or body language. The human brain integrates all relevant information to create a new multisensory percept and derives emotional meaning. There exists a great interest for emotions recognition in various fields such as HCI, gaming, marketing, and assistive technologies. This demand is driving an increase in research on multisensory emotion recognition. The majority of existing work proceeds by extracting meaningful features from each modality and applying fusion techniques either at a feature level or decision level. However, these techniques are ineffective in translating the constant talk and feedback between different modalities. Such constant talk is particularly crucial in continuous emotion recognition, where one modality can predict, enhance and complete the other. This thesis proposes novel architectures for multisensory emotions recognition inspired by multisensory integration in the brain. First, we explore the use of bio-inspired unsupervised learning for unisensory emotion recognition for audio and visual modalities. Then we propose three multisensory integration models, based on different pathways for multisensory integration in the brain; that is, integration by convergence, early cross-modal enhancement, and integration through neural synchrony. The proposed models are designed and implemented using third generation neural networks, Spiking Neural Networks (SNN) with unsupervised learning. The models are evaluated using widely adopted, third-party datasets and compared to state-of-the-art multimodal fusion techniques, such as early, late and deep learning fusion. Evaluation results show that the three proposed models achieve comparable results to state-of-the-art supervised learning techniques. More importantly, this thesis shows models that can translate a constant talk between modalities during the training phase. Each modality can predict, complement and enhance the other using constant feedback. The cross-talk between modalities adds an insight into emotions compared to traditional fusion techniques

Similar works

This paper was published in St Andrews Research Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.