A Deep Learning Feature Mapping Algorithm for Emotion Detection via Facial and Audio Signals

Tayarani, Mohammad, Shahid, Shamim, Foerster, Frank and Steuber, Volker (2026) A Deep Learning Feature Mapping Algorithm for Emotion Detection via Facial and Audio Signals. Applied Soft Computing, 197: 114998. ISSN 1568-4946

Copy

Automatic emotion recognition plays a critical role in areas such as mental-health monitoring, human–robot interaction, and personalised learning systems, yet current multimodal approaches often struggle with high intra-class variability and the limited discriminative power of raw audio–visual features. Existing methods typically rely on direct classification of audio or facial data, which does not explicitly enforce a structured joint embedding in which emotional categories become separable. This paper addresses this limitation by proposing a supervised contrastive feature-mapping algorithm that transforms temporal audio and video features into a representation that minimises intra-class distances while maximising inter-class distances. In contrast to prior work, which usually focuses on handcrafted feature engineering or end-to-end classifiers, our approach explicitly learns a discriminative metric space that enhances the geometry of the feature distribution. The method is evaluated on the RAVDESS and CREMA-D benchmark datasets. Experimental results show that the proposed mapping yields consistent accuracy improvements over strong machine-learning baselines, with gains of up to approximately 6\%, achieving 96.07\% accuracy on RAVDESS and competitive performance on CREMA-D, while outperforming or matching recent state-of-the-art multimodal emotion-recognition pipelines. Statistical tests (Kruskal–Wallis and paired t-tests) confirm that the learned representation significantly increases class separability ($p<10^{−5}$). While the method assumes the availability of paired audio–visual inputs without requiring explicit temporal alignment, the learned feature space is compact, discriminative, and well-suited to downstream tasks such as affect-aware dialogue systems, rehabilitation monitoring, and adaptive educational interfaces. These results demonstrate that contrastive feature mapping provides a robust and generalisable framework for multimodal emotion analysis.

Item Type	Article
Identification Number	10.1016/j.asoc.2026.114998
Additional information	© 2026 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ).
Date Deposited	09 Apr 2026 08:07
Last Modified	09 Apr 2026 08:07

Explore Further

Applied Soft Computing

picture_as_pdf: A_deep_learning_feature_mapping_algorithm_for_emotion_detection_via_facial_and_audio_signals.pdf
subject: Published Version
: Available under Creative Commons: BY 4.0

View

Download

EndNote

BibTeX

Reference Manager

Refer

Atom

Dublin Core

MPEG-21 DIDL

METS

HTML Citation

RIOXX2 XML

OpenURL ContextObject

MODS

Data Cite XML

ASCII Citation

OpenURL ContextObject in Span

Export

Downloads