Abstract

In recently years, thanks to the opportunities brought by the developing technology, high-speed images of the vocal cords have been started to widely use in detection of problems with the speech system and analysis of speech. These high-speed images contain detailed information about the vibration of the speaker's vocal cords. However, considering the size of the image data, it does not seem possible to manually process these images. For this reason, glottis detection and segmentation from vocal cord images has become popular with the development of automatic image processing algorithms in recent years. Unlike the other literature studies, in this study, the accuracy, precision (sensitivity), recall, F1-score and equal error rate performance criteria are examined used to automatically classify vocal cord images based on pixels. In addition to this, deep artificial neural network, that pixel classification based model in the literature, has been compared with the newly proposed model Gaussian Mixture Model. 3000 high speed endoscopic camera images manually segmented with dimensions 256x256 pixels were used to generate training, development and evaluation data sets of randomly. As a result of the studies conducted with the validation and evaluation sets of models trained with the data set, the accuracy, precision, recall and F1 score criteria, which are commonly used in binary classification, changed only by 1% from model to model. And this result has shown that other performance metrics are not as effective as equal error rate that reflecting the system 22% change in the same situation. As a result of this study, even if the accuracy values of the systems remain the same, equal error rate differences may change, therefore it has been shown that overfitted systems can be predicted more accurately. Comparing the models proposed with the based system, the proposed system gave the best result for all performance criteria using the 4096 component Gaussian Mixture Model, and it is showed a performance improvement of 22% for the equal error rate in the evaluation set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call