Show simple item record

dc.contributor.authorIyer, Shreyah
dc.contributor.authorGlackin, Cornelius
dc.contributor.authorCannings, Nigel
dc.contributor.authorVeneziano, Vito
dc.contributor.authorSun, Yi
dc.date.accessioned2023-11-03T14:15:02Z
dc.date.available2023-11-03T14:15:02Z
dc.date.issued2022-09-30
dc.identifier.citationIyer , S , Glackin , C , Cannings , N , Veneziano , V & Sun , Y 2022 , A Comparison Between Convolutional and Transformer Architectures for Speech Emotion Recognition . in 2022 International Joint Conference on Neural Networks (IJCNN) . Institute of Electrical and Electronics Engineers (IEEE) , Padua, Italy , 2022 International Joint Conference on Neural Networks (IJCNN) , Padua , Italy , 18/07/22 . https://doi.org/10.1109/IJCNN55064.2022.9891882
dc.identifier.citationconference
dc.identifier.isbn978-1-6654-9526-4
dc.identifier.isbn978-1-7281-8671-9
dc.identifier.urihttp://hdl.handle.net/2299/27076
dc.description© 2022, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This is the accepted manuscript version of a conference paper which has been published in final form at https://doi.org/10.1109/IJCNN55064.2022.9891882
dc.description.abstractCreating speech emotion recognition models com-parable to the capability of how humans recognise emotions is a long-standing challenge in the field of speech technology with many potential commercial applications. As transformer-based architectures have recently become the state-of-the-art for many natural language processing related applications, this paper investigates their suitability for acoustic emotion recognition and compares them to the well-known AlexNet convolutional approach. This comparison is made using several publicly available speech emotion corpora. Experimental results demonstrate the efficacy of the different architectural approaches for particular emotions. The results show that the transformer-based models outperform their convolutional counterparts yielding F1-scores in the range [70.33%, 75.76%]. This paper further provides insights via dimensionality reduction analysis of output layer activations in both architectures and reveals significantly improved clustering in transformer-based models whilst highlighting the nuances with regard to the separability of different emotion classes.en
dc.format.extent8
dc.format.extent382206
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof2022 International Joint Conference on Neural Networks (IJCNN)
dc.subjectalexnet
dc.subjectconvolutional neural networks
dc.subjectmel spectrograms
dc.subjectspeech emotion recognition
dc.subjecttransfer learning
dc.subjecttransformers
dc.subjectwav2vec2
dc.subjectSoftware
dc.subjectArtificial Intelligence
dc.titleA Comparison Between Convolutional and Transformer Architectures for Speech Emotion Recognitionen
dc.contributor.institutionDepartment of Computer Science
dc.contributor.institutionSchool of Physics, Engineering & Computer Science
dc.contributor.institutionCentre for Computer Science and Informatics Research
dc.contributor.institutionBiocomputation Research Group
dc.contributor.institutionCybersecurity and Computing Systems
dc.date.embargoedUntil2022-09-30
dc.identifier.urlhttp://www.scopus.com/inward/record.url?scp=85140725350&partnerID=8YFLogxK
rioxxterms.versionofrecord10.1109/IJCNN55064.2022.9891882
rioxxterms.typeOther
herts.preservation.rarelyaccessedtrue


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record