Abstract

This paper examines multilingual audio Query-by-Example (QbE) retrieval, utilizing the posteriorgram-based Phonetic Unit Modelling (PUM) approach and the Weighted Fast Sequential Dynamic Time Warping (WFSDTW) algorithm. The PUM approach employs phone recognizers trained on language-specific external resources in a supervised way. Thus, the information about the phonetic distribution is embedded in the process of acoustic modelling. The resulting acoustic models were also used for language-independent QbE retrieval. The improved WFSDTW algorithm was implemented in order to perform retrievals for each query (keyword) within the particular utterance file. The major interest is placed on a retrieval performance measurement of the proposed WFSDTW solution employing posteriorgram-based keyword matching with Gaussian mixture modelling (GMM). Score normalization and fusion of four different language-dependent sub-systems was carried out using a simple max-score merging strategy. The results show a certain predominance of the proposed WFSDTW solution among two other evaluated techniques, namely basic DTW and segmental DTW algorithms. Also, the combination of multiple PUM techniques together with the WFSDTW has been proved as an effective solution for the QbE task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.