Text normalization in mandarin text-to-speech system

Yuxiang Jia Yuxiang Jia,Shiwen Yu,Dezhi Huang Dezhi Huang,Yuan Dong Yuan Dong,Wu Liu Wu Liu,Haila Wang

doi:10.1109/icassp.2008.4518704

Text normalization in mandarin text-to-speech system

Yuxiang Jia Yuxiang Jia, Shiwen Yu + Show 4 more

https://doi.org/10.1109/icassp.2008.4518704

Copy DOI

Export

Save

Cite

Publication Date: Mar 1, 2008

Citations: 10

Affiliation: Peking University, Orange (France)

#Maximum Entropy Classifiers #Non-standard Words #Finite State Automata #Text Normalization #Maximum Entropy #Two-stage Approach #High Performance Classifiers #Initial Entropy #Classification Entropy #Two-stage Strategy

Abstract
Full-Text
Similar Papers

Abstract

Listen

Text normalization is an important component in text-to-speech system and the difficulty in text normalization is to disambiguate the non-standard words (NSWs). This paper develops a taxonomy of NSWs on the basis of a large scale Chinese corpus, and proposes a two-stage NSWs disambiguation strategy, finite state automata (FSA) for initial classification and maximum entropy (ME) classifiers for subclass disambiguation. Based on the above NSWs taxonomy, the two-stage approach achieves an F-score of 98.53% in open test, 5.23% higher than that of FSA based approach. Experiments show that the NSWs taxonomy ensures FSA a high baseline performance and ME classifiers make considerable improvement, and the two-stage approach adapts well to new domains.

Full Text

Published Version

Check institute access

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.

R Discovery Prime

Text normalization in mandarin text-to-speech system

Abstract

Published Version

Talk to us

Similar Papers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Text normalization in mandarin text-to-speech system

Abstract

Published Version

Talk to us

Similar Papers