Abstract
Protecting sensitive information while preserving the share ability and usability of data is becoming increasingly important in the outsourced business process industry. Particularly in the context of call-centers a lot of customer related sensitive information is stored in audio recordings. In this work, we address the problem of protecting sensitive customer information in audio recordings and Automatic Speech Recognition (ASR) transcripts. The high word error rates, spontaneous nature of communication and the variability in agent-customer interaction makes it harder and expensive to craft rules or build annotators to detect sensitive information. In this paper we propose a semi supervised method to model sensitive information as a directed graph which is automatically generated from ASR transcripts. Vocabularies specific to the nodes are generated using features of context sensitive clusters. The direction and weight of the edge capture the ordering and timing constraints respectively for these features. These constraints are learnt from the time stamps associated with ASR transcripts. The effectiveness of this approach is demonstrated by applying it to the problem of detecting and locating credit card transaction in real life conversations between agents and customer of a call center.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.