Open-set Speaker Identification
Abstract
This study is motivated by the growing need for effective extraction of intelligence and evidence from audio recordings in the fight against crime, a need made ever more apparent with the recent expansion of criminal and terrorist organisations. The main focus is to enhance open-set speaker identification process within the speaker identification systems, which are affected by noisy audio data obtained under uncontrolled environments such as in the street, in restaurants or other places of businesses. Consequently, two investigations are initially carried out including the effects of environmental noise on the accuracy of open-set speaker recognition, which thoroughly cover relevant conditions in the considered application areas, such as variable training data length, background noise and real world noise, and the effects of short and varied duration reference data in open-set speaker recognition.
The investigations led to a novel method termed “vowel boosting” to enhance the reliability in speaker identification when operating with varied duration speech data under uncontrolled conditions. Vowels naturally contain more speaker specific information. Therefore, by emphasising this natural phenomenon in speech data, it enables better identification performance. The traditional state-of-the-art GMM-UBMs and i-vectors are used to evaluate “vowel boosting”. The proposed approach boosts the impact of the vowels on the speaker scores, which improves the recognition accuracy for the specific case of open-set identification with short and varied duration of speech material.
Publication date
2018-12-20Published version
https://doi.org/10.18745/th.21828https://doi.org/10.18745/th.21828
Funding
Default funderDefault project
Other links
http://hdl.handle.net/2299/21828Metadata
Show full item recordThe following license files are associated with this item:
Related items
Showing items related by title, author, creator and subject.
-
Effectiveness of speaker-dependent feature score pruning in speaker verification
Pillay, S.G.; Ariyaeeinia, A.; Pawlewski, M. (Institute of Electrical and Electronics Engineers (IEEE), 2008) -
Open-Set Speaker Identification under Mismatch Conditions
Pillay, S.G.; Ariyaeeinia, A.; Sivakumaran, P.; Pawlewski, M. (2009)This paper presents investigations into the performance of open-set, text-independent speaker identification (OSTI-SI) under mismatched data conditions. The scope of the study includes attempts to reduce the adverse effects ... -
Speaker-specific formant dynamics: an experiment on Australian English /aɪ/
McDougall, Kirsty (2004)Formant frequency dynamics are relevant to forensic speaker identification since they are determined by the shape and size of a speaker’s vocal tract and the way he or she configures the articulators for speech. This study ...