A Comparison Between Convolutional and Transformer Architectures for Speech Emotion Recognition

Iyer, Shreyah; Glackin, Cornelius; Cannings, Nigel; Veneziano, Vito; Sun, Yi

dc.contributor.author	Iyer, Shreyah
dc.contributor.author	Glackin, Cornelius
dc.contributor.author	Cannings, Nigel
dc.contributor.author	Veneziano, Vito
dc.contributor.author	Sun, Yi
dc.date.accessioned	2023-11-03T14:15:02Z
dc.date.available	2023-11-03T14:15:02Z
dc.date.issued	2022-09-30
dc.identifier.citation	Iyer , S , Glackin , C , Cannings , N , Veneziano , V & Sun , Y 2022 , A Comparison Between Convolutional and Transformer Architectures for Speech Emotion Recognition . in 2022 International Joint Conference on Neural Networks (IJCNN) . Institute of Electrical and Electronics Engineers (IEEE) , Padua, Italy , 2022 International Joint Conference on Neural Networks (IJCNN) , Padua , Italy , 18/07/22 . https://doi.org/10.1109/IJCNN55064.2022.9891882
dc.identifier.citation	conference
dc.identifier.isbn	978-1-6654-9526-4
dc.identifier.isbn	978-1-7281-8671-9
dc.identifier.uri	http://hdl.handle.net/2299/27076
dc.description	© 2022, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This is the accepted manuscript version of a conference paper which has been published in final form at https://doi.org/10.1109/IJCNN55064.2022.9891882
dc.description.abstract	Creating speech emotion recognition models com-parable to the capability of how humans recognise emotions is a long-standing challenge in the field of speech technology with many potential commercial applications. As transformer-based architectures have recently become the state-of-the-art for many natural language processing related applications, this paper investigates their suitability for acoustic emotion recognition and compares them to the well-known AlexNet convolutional approach. This comparison is made using several publicly available speech emotion corpora. Experimental results demonstrate the efficacy of the different architectural approaches for particular emotions. The results show that the transformer-based models outperform their convolutional counterparts yielding F1-scores in the range [70.33%, 75.76%]. This paper further provides insights via dimensionality reduction analysis of output layer activations in both architectures and reveals significantly improved clustering in transformer-based models whilst highlighting the nuances with regard to the separability of different emotion classes.	en
dc.format.extent	8
dc.format.extent	382206
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	2022 International Joint Conference on Neural Networks (IJCNN)
dc.subject	alexnet
dc.subject	convolutional neural networks
dc.subject	mel spectrograms
dc.subject	speech emotion recognition
dc.subject	transfer learning
dc.subject	transformers
dc.subject	wav2vec2
dc.subject	Software
dc.subject	Artificial Intelligence
dc.title	A Comparison Between Convolutional and Transformer Architectures for Speech Emotion Recognition	en
dc.contributor.institution	Department of Computer Science
dc.contributor.institution	School of Physics, Engineering & Computer Science
dc.contributor.institution	Centre for Computer Science and Informatics Research
dc.contributor.institution	Department of Pharmacy, Pharmacology and Postgraduate Medicine
dc.contributor.institution	Biocomputation Research Group
dc.date.embargoedUntil	2022-09-30
dc.identifier.url	http://www.scopus.com/inward/record.url?scp=85140725350&partnerID=8YFLogxK
rioxxterms.versionofrecord	10.1109/IJCNN55064.2022.9891882
rioxxterms.type	Other
herts.preservation.rarelyaccessed	true

Files in this item

Name:: final_camera_ready_version.pdf
Size:: 373.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Research publications

Show simple item record