Abstract

For precise temporal characteristic description, disagreements between manual labeling and automatic labeling were quantitatively analyzed with respect to the spectral feature extraction, adoption of acoustic matchers (HMM models), and acoustic matcher by itself. Error analysis shows that boundaries are shifted at phone boundaries where the speech spectrum changes quite rapidly. This disagreement results from the spectral feature extraction averaged over a given window. For the adoption of model, big errors are found at phone boundaries where the spectrum changes slowly. The third model-dependent errors are seen at phones whose duration cannot be shorter than the frame increment period times the HMM state number. To take into account these error factors individually to reduce the amount of alignment errors, we modified the automatic alignment results context-dependently using statistical characteristics of phone boundary displacement. This post-processing of boundary modification reduces boundary errors from 14.79 ms to 11.07 ms. Supplementary experiment shows that this improvement of about 4 ms corresponds to eight times of error reduction obtained by speaker adaptation of acoustic matchers. [Work supported by TAO, Japan.]

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.