Abstract
Joint attention (AC) is a skill of human beings essential for the development of the individual, including language learning. Experimental studies in AC commonly involve the analysis of video recordings of scenes with interactions between individuals, and some elements are manually registered, including the intervention of each one. In this work, the design of a speaker identification system is proposed for the analysis of AC, which provides the sequence of interventions from each speaker in videos from AC scenarios. In order to support implementation, a comparative of the most common techniques for speaker identification is provided. Such techniques include the Mel Frequency Cepstral Coefficients (MFCC) and the addition of the MFCC + deltaMFCC. For classification, the Gaussian mixture models (GMM) and support vector machines (SVM) were employed. Results after a 5-fold cross validation process, with 30 audio segments with a duration of 3-4 seconds, throw an accuracy close to 90%, using MFCC+deltaMFCC with SVM. This result evidences the implementation feasibility of the proposed system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.