Abstract
This research introduces a comprehensive strategy to enhance the performance of an existing automatic speech recognition (ASR) model, which has been previously documented in published articles. The study sets out to achieve several objectives. Firstly, it concentrates on updating the ASR model by retraining it with new datasets. This involves integrating samples from the latest Common Voice corpus release and data collected independently via the armspeech.com web application. Another key focus lies in optimizing the ASR model for near-real-time processing, intending to improve its speed and efficiency. The proposed adjustments to the model’s architecture aim to balance accuracy and processing speed, which is essential for applications requiring prompt speech recognition. Furthermore, the research explores the integration of Transformer models into the post-processing pipeline to introduce punctuation and capitalization into the ASR output. This step not only enhances the linguistic quality of transcriptions but also improves their readability and usability. In tandem with these advancements, the research presents a systematic approach to gathering, annotating, and storing datasets specifically tailored for punctuation and capitalization tasks. The methodology outlines the acquisition and organization of a dataset conducive to training Transformer models for these linguistic tasks. This comprehensive approach, which encompasses dataset enrichment, architectural modifications, and post-processing enhancements, aims to elevate the ASR model’s accuracy, speed, and linguistic refinement, with a particular focus on addressing the intricacies of the Armenian language. The research contributes valuable insights into the optimization of ASR systems, tackling both language-specific challenges and broader issues related to linguistic post-processing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.