Open-set Speaker Identification
This study is motivated by the growing need for effective extraction of intelligence and evidence from audio recordings in the fight against crime, a need made ever more apparent with the recent expansion of criminal and terrorist organisations. The main focus is to enhance open-set speaker identification process within the speaker identification systems, which are affected by noisy audio data obtained under uncontrolled environments such as in the street, in restaurants or other places of businesses. Consequently, two investigations are initially carried out including the effects of environmental noise on the accuracy of open-set speaker recognition, which thoroughly cover relevant conditions in the considered application areas, such as variable training data length, background noise and real world noise, and the effects of short and varied duration reference data in open-set speaker recognition. The investigations led to a novel method termed “vowel boosting” to enhance the reliability in speaker identification when operating with varied duration speech data under uncontrolled conditions. Vowels naturally contain more speaker specific information. Therefore, by emphasising this natural phenomenon in speech data, it enables better identification performance. The traditional state-of-the-art GMM-UBMs and i-vectors are used to evaluate “vowel boosting”. The proposed approach boosts the impact of the vowels on the speaker scores, which improves the recognition accuracy for the specific case of open-set identification with short and varied duration of speech material.
MetadataShow full item record
The following license files are associated with this item:
Showing items related by title, author, creator and subject.
Pillay, S.G.; Ariyaeeinia, A.; Pawlewski, M. (Institute of Electrical and Electronics Engineers (IEEE), 2008)
Pillay, S.G.; Ariyaeeinia, A.; Sivakumaran, P.; Pawlewski, M. (2009)This paper presents investigations into the performance of open-set, text-independent speaker identification (OSTI-SI) under mismatched data conditions. The scope of the study includes attempts to reduce the adverse effects ...
Murray, J.C.; Cañamero, Lola (2009)In this paper we present a socially interactive multi-modal robotic head, ERWIN - Emotional Robot With Intelligent Networks, capable of emotion expression and interaction via speech and vision. The model presented shows ...