Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning

Resende Faria, Diego; Weinberg, Abraham Itzhak; Ayrosa, Pedro Paulo

dc.contributor.author	Resende Faria, Diego
dc.contributor.author	Weinberg, Abraham Itzhak
dc.contributor.author	Ayrosa, Pedro Paulo
dc.date.accessioned	2024-08-09T14:15:02Z
dc.date.available	2024-08-09T14:15:02Z
dc.date.issued	2024-08-30
dc.identifier.citation	Resende Faria , D , Weinberg , A I & Ayrosa , P P 2024 , ' Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning ' , Applied Sciences , vol. 14 , no. 15 , 6631 . https://doi.org/10.3390/app14156631
dc.identifier.issn	2076-3417
dc.identifier.other	Jisc: 2171972
dc.identifier.other	publisher-id: applsci-14-06631
dc.identifier.uri	http://hdl.handle.net/2299/28092
dc.description	© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/)
dc.description.abstract	Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human–robot interaction, and cross-cultural communication.	en
dc.format.extent	28
dc.format.extent	6482999
dc.language.iso	eng
dc.relation.ispartof	Applied Sciences
dc.subject	affective communication
dc.subject	data fusion
dc.subject	sentiment analysis
dc.subject	machine learning
dc.subject	deep learning
dc.subject	dynamic Bayesian mixture model
dc.subject	multimodality
dc.subject	speech emotion recognition
dc.subject	General Engineering
dc.subject	Instrumentation
dc.subject	Fluid Flow and Transfer Processes
dc.subject	Process Chemistry and Technology
dc.subject	General Materials Science
dc.subject	Computer Science Applications
dc.title	Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning	en
dc.contributor.institution	Department of Computer Science
dc.contributor.institution	School of Physics, Engineering & Computer Science
dc.description.status	Peer reviewed
dc.identifier.url	http://www.scopus.com/inward/record.url?scp=85200871794&partnerID=8YFLogxK
rioxxterms.versionofrecord	10.3390/app14156631
rioxxterms.type	Journal Article/Review
herts.preservation.rarelyaccessed	true

Files in this item

Name:: applsci-14-06631-v2.pdf
Size:: 6.182Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Research publications

Show simple item record