Abstract
Automating telephony surveillance is an appealing and appropriate technology from the view point of being able to detect/spot if a person from a specific watch-list is on line. Such an automatic solution is of considerable interest in the context of homeland security where a potentially large number of wire tapped conversations may have to be processed in parallel, in different deployment scenarios and demographic conditions, and with typically large watch-lists, all of which make manual lawful interception unmanageable, tedious and perhaps even impossible. In this chapter, we first introduce this problem domain starting with a sketch of a glamorous fictitious example, followed by an outline of lawful interception and wire-tapping; we then take a brief look at similar watch-list based negative recognition application using the now very successful Iris biometrics and consider equivalent scenarios in the context of speaker-spotting based on voice as a biometric. Further, in the main body of this chapter, we first provide the basic framework for watch-list based speaker-spotting, namely, open-set speaker identification, subsequently refined into a ‘multi-target detection’ framework. We then examine in some detail the main theoretical analysis available within the framework of multi-target identification, leading to performance predictions of such systems with respect to the watch-list size as the critical factor. In a related note, we also briefly touch on the prioritization mode of operation which also lends itself to interesting theoretical analysis and performance predictions. Speaker-spotting systems face unique challenges, in a way combining the difficulties inherent in conventional speaker authentication applications as well as forensic speaker recognition applications; we consider these, while using the NIST SRE evaluation results to gain insights on the performances achievable presently and the latent performance limitations which seem to warrant a cautionary approach before widespread use of speaker recognition technology for surveillance applications becomes possible. In the later part of the chapter, we outline related topics such as speaker change detection, speaker segmentation and speaker diarization, followed by a summary of product level solutions currently available in the context of surveillance and homeland security applications, finally concluding with discussions highlighting the state-of-the-art and potential future research directions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.