Abstract

Abstract In the quest to streamline Korean text and speech annotation, this research introduces innovative automatic annotation methods that promise to revolutionize efficiency and technical prowess in constructing Korean annotation datasets. By leveraging the sophisticated Seq2Seq architecture with BERT and bidirectional GRU models, we significantly enhance the model’s ability to grasp contextual nuances, ensuring precise text annotations. The speech annotation frontier benefits from a novel amalgamation of the Hidden Markov Model’s forced alignment and semi-supervised learning, perfected with Seneff auditory features for meticulous phonological consonant boundary detection. Empirical validation across diverse datasets showcases our methodology’s superiority, achieving a remarkable 96.01% accuracy in text annotation and setting a new benchmark for phonological boundary detection at a 14.5ms minimum distance threshold. Our approach outperforms traditional algorithms, marking a pivotal step forward in Korean automatic annotation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.