Abstract
We introduce a simple and efficient frame and segment levelRNN model (FS-RNN) for phone classification. It processesthe input atframe levelandsegment levelby bidirectional gatedRNNs. This type of processing is important to exploit the(temporal) information more effectively compared to(i)mod-els which solely process the input at frame level and(ii)mod-els which process the input on segment level using features ob-tained by heuristic aggregation of frame level features. Further-more, we incorporated the activations of the last hidden layerof the FS-RNN as an additional feature type in a neural higher-order CRF (NHO-CRF). In experiments, we demonstrated ex-cellent performance on the TIMIT phone classification task, re-porting a performance of13.8%phone error rate for the FS-RNN model and11.9%when combined with the NHO-CRF. Inboth cases we significantly exceeded the state-of-the-art perfor-mance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.