A Three-Stage Text Normalization Strategy for Mandarin Text-to-Speech Systems

Tao Zhou,Haila Wang,Yuan Dong,Wu Liu,Dezhi Huang

doi:10.1109/chinsl.2008.ecp.43

A Three-Stage Text Normalization Strategy for Mandarin Text-to-Speech Systems

Tao Zhou, Haila Wang + Show 3 more

Open Access

https://doi.org/10.1109/chinsl.2008.ecp.43

Copy DOI

Publication Date: Dec 1, 2008

Citations: 22

Affiliation: Beijing University of Posts and Telecommunications, Orange (France)

#Text Normalization #Large-scale Chinese Corpus + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Text normalization is an important component in mandarin Text-to-Speech system. This paper develops a taxonomy of Non-Standard Words (NSW's) based on a Large-scale Chinese corpus and proposes a three-stage text normalization strategy: Finite State Automata (FSA) for initial classification, Maximum Entropy (ME) Classifier & Rules for further classification and General Rules for standard word conversion. The three-stage approach achieves Precision of 96.02% in experiments, 5.21% higher than that of simple rule based approach and 2.21% higher than that of simple machine learning method. Experiments results show that the approach of three-stage disambiguation strategy for text normalization makes considerable improvement, and works well in real TTS system.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.