Abstract

In this paper, we extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features. We propose here two novel classification approaches to this problem: one based on the use of the powerful type-2 fuzzy logic systems (type-2 FLS) and the other on the use of the discriminative sensitivity-based linear learning method (SBLLM). The use of prosodic features has been used in a plethora of practical applications, including speech-related applications, such as speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. In this paper, we continue to specifically focus on the Arabic language, as other languages have received a lot of attention in this regard. Moreover, we aim to improve the performance of our previously-used techniques, of which the support vector machine (SVM) method was the best performing, by applying the two above-mentioned powerful classification approaches. The recorded continuous speech is first segmented into sentences using both energy and time duration parameters. The prosodic features are then extracted from each sentence and fed into each of the two proposed classifiers so as to classify each sentence as a Question or a Non-Question sentence. Our extensive simulation work, based on a moderately-sized database, showed the two proposed classifiers outperform SVM in all of the experiments carried out, with the type-2 FLS classifier consistently exhibiting the best performance, because of its ability to handle all forms of uncertainties.

Highlights

  • There has been a huge increase in the amount of data generated and stored as computers and Internet are increasingly becoming part of our everyday life

  • We extend our previous study of addressing the important problem of automatically identifying question and non-question segments in Arabic monologues using prosodic features

  • We proposed two novel approaches, based on type-2 fuzzy logic systems and the sensitivity-based linear learning method (SBLLM), to the identification of question and non-question segments in a monologue, using on prosodic features

Read more

Summary

Introduction

There has been a huge increase in the amount of data generated and stored as computers and Internet are increasingly becoming part of our everyday life. This huge information exists in various formats: text, audio and video formats. With the availability of broader bandwidths in internet communication, there has been an increase in audio and video content on the Internet, in addition to text and image data that people were earlier used to. Audio and video contents are widely shared through file-sharing peer-to-peer networks. Multimedia content constitutes the bulk of the Internet traffic in the form of IP-telephony, video and audio conferencing, Internet radio stations, music stores, lecture sites etc

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call