This research aims to develop a robust Application Program Interface (API)-Based Artificial Intelligence (AI) system for effective noise removal from audio signals, enhancing speech quality and intelligibility in noisy environments to be fed into different AI models to assess the applicant interview. The proposed methodology combines sophisticated signal processing techniques and noise reduction algorithms with AI models trained on clean voice data and noise patterns. To achieve this goal, we leverage two key components: the Wiener filter and a Convolutional Neural Network (CNN). The Wiener filter serves as the foundational noise reduction technique, exploiting statistical properties of the signal and the noise to suppress unwanted noise components effectively. Concurrently, CNN is integrated to classify the clean and noisy audio. In this research, the best optimizers selected, including Adam, SGD, RMSprop, Adagrad, and Adadelta are evaluated to identify the most suitable classification. The optimizers evaluated through cross-validation and hold-out validation in the same batch size (25) and epoch (25) were used. The study demonstrates that the Adam optimizer yields the best results. The epoch was optimized to 35, 75, 105, and 125 and epoch of 105 was selected with accuracy of 99.52%, Recall of 100%, F1-Score of 99.50%, and ROC_AUC of 99.99% for cross-validation while Accuracy of 98.79%, Recall of 99.21%, F1-Score of 98.81%, and ROC_AUC of 99.54% for hold-out validation, significantly improving AI model performance. Lastly, we ensured the batch size parameter was suitable for our model by tuning it with different settings (25, 50, 75, and 125) using the optimized optimizer and epoch. The batch size of 25 yielded the best accuracy. The modeled CNN also included kernel regularization L2 to avoid overfitting.
Read full abstract