A Deep Learning Feature Mapping Algorithm for Emotion Detection via Facial and Audio Signals
Automatic emotion recognition plays a critical role in areas such as mental-health monitoring, human–robot interaction, and personalised learning systems, yet current multimodal approaches often struggle with high intra-class variability and the limited discriminative power of raw audio–visual features. Existing methods typically rely on direct classification of audio or facial data, which does not explicitly enforce a structured joint embedding in which emotional categories become separable. This paper addresses this limitation by proposing a supervised contrastive feature-mapping algorithm that transforms temporal audio and video features into a representation that minimises intra-class distances while maximising inter-class distances. In contrast to prior work, which usually focuses on handcrafted feature engineering or end-to-end classifiers, our approach explicitly learns a discriminative metric space that enhances the geometry of the feature distribution. The method is evaluated on the RAVDESS and CREMA-D benchmark datasets. Experimental results show that the proposed mapping yields consistent accuracy improvements over strong machine-learning baselines, with gains of up to approximately 6\%, achieving 96.07\% accuracy on RAVDESS and competitive performance on CREMA-D, while outperforming or matching recent state-of-the-art multimodal emotion-recognition pipelines. Statistical tests (Kruskal–Wallis and paired t-tests) confirm that the learned representation significantly increases class separability ($p<10^{−5}$). While the method assumes the availability of paired audio–visual inputs without requiring explicit temporal alignment, the learned feature space is compact, discriminative, and well-suited to downstream tasks such as affect-aware dialogue systems, rehabilitation monitoring, and adaptive educational interfaces. These results demonstrate that contrastive feature mapping provides a robust and generalisable framework for multimodal emotion analysis.
| Item Type | Article |
|---|---|
| Identification Number | 10.1016/j.asoc.2026.114998 |
| Additional information | © 2026 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ). |
| Date Deposited | 09 Apr 2026 08:07 |
| Last Modified | 09 Apr 2026 08:07 |
