ATA mining is concerned with the science, technology, and engineering of discovering patterns and extracting potentially useful or interesting information automatically or semi-automatically from data. Data mining was introduced in the 1990s and has deep roots in the fields of statistics, artificial intelligence, and machine learning. With the advent of inexpensive storage space and faster processing over the past decade or so, data mining research has started to penetrate new grounds in areas of speech and audio processing as well as spoken language dialog. It has been fueled by the influx of audio data that are becoming more widely available from a variety of multimedia sources including webcasts, conversations, music, meetings, voice messages, lectures, television, and radio. Algorithmic advances in automatic speech recognition have also been a major, enabling technology behind the growth in data mining. Current state-of-the-art, large-vocabulary, continuous speech recognizers are now trained on a record amount of data—several hundreds of millions of words and thousands of hours of speech. Pioneering research in robust speech processing, large-scale discriminative training, finite state automata, and statistical hidden Markov modeling have resulted in real-time recognizers that are able to transcribe spontaneous speech with a word accuracy exceeding 85%. With this level of accuracy, the technology is now highly attractive for a variety of speech mining applications. Speech mining research includes many ways of applying machine learning, speech processing, and language processing algorithms to benefit and serve commercial applications. It also raises and addresses several new and interesting fundamental research challenges in the areas of prediction, search, explanation, learning, and language understanding. These basic challenges are becoming increasingly important in revolutionizing business processes by providing essential sales and marketing information about services, customers, and product offerings. They are also enabling a new class of learning systems to be created that can infer knowledge and trends automatically from data, analyze and report application performance, and adapt and improve over time with minimal or zero human involvement. Effective techniques for mining speech, audio, and dialog data can impact numerous business and government applications. The technology for monitoring conversational speech to discover patterns, capture useful trends, and generate alarms is essential for intelligence and law enforcement organizations as well as for enhancing call center operation. It is useful for an
Read full abstract