Efficient Search Mechanism from Large Scale Corpora for Domain-Specific Language Modeling in Speech Recognition

doi:10.35940/ijeat.f8416.088619

Abstract

With the Internet and the World Wide Web revolution, large corpora in variety of forms are germinating ceaselessly that can be manifested as big data. One obligatory area for the usage of such large corpora is language modeling for large vocabulary continuous speech recognition. Language modeling is an indispensable module in speech recognition architecture, which plays a vital role in reducing the search space during the recognition process. Additionally, the language model that is contiguous to the domain of the speech can dwindle the search space and escalate the recognition accuracy. In this paper, an efficient searching mechanism for domain-specific document retrieval from the large corpora has been elucidated using Elasticsearch which is a distributed and an efficient search engine for big data. This assisted us in tuning the language model in accordance with the domain and also by reducing the search time by more than 90% in comparison to conventional search and retrieval mechanism used in our earlier work. A word level and a phrase level retrieval process for creating domain-specific language model has been implemented. The evaluation of the system is performed on the basis of word error rate (WER) and perplexity (PPL) of the speech recognition system. The results shows nearly 10% decrease on WER and a major reduction in the PPL that helped in boosting the performance of the speech recognition process. From the results, it can be consummated that Elasticsearch is an efficient mechanism for domain specific document retrieval from large corpora rather than using topic modeling toolkits

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Search Mechanism from Large Scale Corpora for Domain-Specific Language Modeling in Speech Recognition

Abstract

Talk to us

Similar Papers

More From: International Journal of Engineering and Advanced Technology

Lead the way for us

Similar Papers

Leveraging relevance cues for language modeling in speech recognition
Berlin Chen ... Kuan-Yu Chen
Information Processing and Management | VOL. 49
Berlin Chen, et. al.Berlin Chen ... Kuan-Yu Chen
28 Feb 2013
Information Processing and Management | VOL. 49

Effective pseudo-relevance feedback for language modeling in speech recognition
Berlin Chen ... Kuan-Yu Chen
-
Berlin Chen, et. al.Berlin Chen ... Kuan-Yu Chen
01 Dec 2013
01 Dec 2013

Minimum word error training of long short-term memory recurrent neural network language models for speech recognition
Takaaki Hori ... Chiori Hori
-
Takaaki Hori, et. al.Takaaki Hori ... Chiori Hori
01 Mar 2016
01 Mar 2016

Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
X Chen ... X Liu
-
X Chen, et. al.X Chen ... X Liu
20 Aug 2017
20 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Search Mechanism from Large Scale Corpora for Domain-Specific Language Modeling in Speech Recognition

Abstract

Talk to us

Similar Papers

More From: International Journal of Engineering and Advanced Technology