Abstract

Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID’s lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN–RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant-[Formula: see text] Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.