Research on Algorithm of Video Analysis System Based on Text Error Correction

Jiachen Luo,Jinjin Wang,Jiaqi Lu,Yang Qin,Guo Huang,Jiahao Shi

doi:10.54097/fcis.v2i3.5510

Abstract

When making a video, if the video has a language organization error, it needs to be re-recorded. It is not possible to remove inappropriate or unnatural pronunciation parts of the recording more effectively. In response to this problem, this paper studies the speech extraction, error correction and synthesis of video, which is divided into three parts: (1) Speech segmentation and speech-to-text of video; (2) Text recognition error correction; (3) Text-to-speech and video speech synthesis. For the first part, we applied the staged and efficient algorithm based on (Bayesian Information Criterion) BIC & (Statistical Mean Euclidean Distance) MEdist to segment the video voice, and then, the segmented audio is subtracted to reduce noise, and finally converted to text using the iFLYTEK interface. For the second part, we apply the (Double Automatic Error Correction) DAEC algorithm to text error correction. For the third part, we use the (Improved Chinese Realtime Voice Cloning) I-Zhrtvc for text-to-speech. Then merge the voice into the video. The simulation result shows that the staged and efficient algorithm based on BIC & MEdist, which accurately segmented by sentences, can identify audio with dialect accents, and has high accuracy in translating to text, up to an average of 95.8%. DAEC algorithm has a high error correction rate. The audio prosody accuracy after synthesis is high. ZVTOW text-to-speech (Mean Opinion Score) MOS up to 4.5.

Full Text