Методи автоматичної ідентифікації диктора за голосом

М В Ткаченко,Р М Федоренко,Ю В Кондратенко,І Г Зотова

doi:10.33099/2304-2745/2018-3-64/131-135

Abstract

Each person has individual voice characteristics, which are determined by the characteristics of the structure of his vocal organs. In the process of communication, people are able to discern the voices of other people on a subconscious level, but for computing technology this task is non-trivial and requires focused research. The purpose of the article is to analyze the existing methods of recognition of speech information, to identify their weak and strong points in order to justify the choice of the most receptive regarding the recognition of the speaker by voice. The growth of the global market for voice recognition devices depends on many factors. One of the main factors is the increase in demand for voice biometrics services. With the increasing complexity and frequency of security breaches, the latter continues to be one of the main requirements for the Armed Forces of Ukraine. The high demand for voice biometrics, which is unique to any person, is crucial in determining a person’s identity. Military departments in most countries use extremely restricted areas to prevent intruders from entering. To ensure secrecy and security in this area, the military uses voice recognition systems. Any recognition system works in two modes: in the registration mode and the identification mode . In other words, you need to have an example voice. Currently, there are a number of methods that allow solving problems of text-independent speaker identification by voice, and each of these methods has its own advantages and disadvantages. However, the most common method is the Gaussian Mixture Model. Models of Gaussian mixtures have proven themselves as a stochastic model for building recognition systems. They are convenient not only for modeling the characteristics of the speaker’s voice, but also for the recording channel and the environment. An effective speech recognition system should include the following steps in processing the input signal: noise removal, segmentation, selection of voiced sections, parameterization, recognition, and correction with a feedback dictionary.

Full Text