Voice recognition systems have gained significant prevalence in our everyday lives, encompassing a wide range of applications, from virtual assistants on smartphones to voice-controlled home automation systems. This research paper presents a comprehensive design and implementation of a voice recognition security system employing artificial neural networks. The system's training involved a dataset consisting of 900 audio samples collected from 10 distinct speakers, enabling the resulting model to accurately classify the speaker of a given audio sample. For the implementation of the voice recognition system, Python serves as the primary programming language. The system leverages the Keras library, which offers a high-level interface for constructing and training neural networks, with efficient computation facilitated by the TensorFlow back-end. Additionally, the Flask framework, a Python-based web framework, was utilized to create a user interface in the form of a web application for the voice recognition system. To effectively train the artificial neural network, the audio data undergoes preprocessing, involving the extraction of relevant features from the audio samples. Subsequently, during the preprocessing phase, the audio data is labelled, and the neural network is trained on this labelled dataset to learn the classification of different speakers. The trained model was rigorously tested on a set of previously unseen audio samples, yielding an impressive classification accuracy exceeding 96%. The finalized model will be integrated into the web application, enabling users to upload audio files and receive accurate predictions regarding the speaker's identity. This paper demonstrates the efficacy of artificial neural networks in the context of voice recognition systems, while also providing a practical framework for constructing such systems using readily available tools and libraries.
Read full abstract