Compressing English Speech Data with Hybrid Methods without Data Loss

Çiğdem Bakir

doi:10.18100/ijamec.1166951

Abstract

Understanding the mechanism of speech formation is of great importance in the successful coding of the speech signal. It is also used for various applications, from authenticating audio files to connecting speech recording to data acquisition device (e.g. microphone). Speech coding is of vital importance in the acquisition, analysis and evaluation of sound, and in the investigation of criminal events in forensics. For the collection, processing, analysis, extraction and evaluation of speech or sounds recorded as audio files, which play an important role in crime detection, it is necessary to compress the audio without data loss. Since there are many voice changing software available today, the number of recorded speech files and their correct interpretation play an important role in detecting originality. Using various techniques such as signal processing, noise extraction, filtering on an incomprehensible speech recording, improving the speech, making them comprehensible, determining whether there is any manipulation on the speech recording, understanding whether it is original, whether various methods of addition and subtraction are used, coding of sounds, the code must be decoded and the decoded sounds must be transcribed. In this study, first of all, what sound coding is, its purposes, areas of use, classification of sound coding according to some features and techniques are given. Moreover, in our study speech coding was done on the English audio data. This dataset is the real dataset and consists of approximately 100000 voice recordings. Speech coding was done using waveform, vocoders and hybrid methods and the success of all the methods used on the system we created was measured. Hybrid models gave more successful results than others. The results obtained will set an example for our future work.

Full Text