Abstract

The database is an essential key element for speech recognition research. This research describes the development of the SCEHMA speech database dedicated to advance speech recognition applications in Hindi, English, Marathi and Arabic languages. The SCEHMA corpus is a collection of isolated word and continuous sentences of speech. For the application domain of agriculture, polyclinic and general-purpose speech recognition in Marathi language 28420 isolated words and 17470 sentences are collected from 300 male and 200 female subjects of 22–30 age groups. The corpus consists of 900 sentences in the Hindi language for accent recognition domain collected from 18 male and 12 female of 18–30 age groups. The English speech corpus was collected from 22–30 age groups of 750 isolated words and 750 sentences from 12 male and 3 female of age group 22–30 for the general domain. The Arabic speech corpus contains 4520 words and 40 sentences from 12 male and 9 female of 18–30 age groups for recognition domain. To achieve a high quality of speech corpus, the recording took place in 10 by 10 office room without a noisy sound environment. The speech utterances were recorded in 16 kHz in three recordings medium, a headset, desktop mounted microphone and Mobile phone. The data was recorded in the morning, and evening session in the room temperature and normal humidity. Speaker was asked to sit in front of the microphone with a distance of about 12–15 cm. The database is collected as per LDCIL protocol and the corpus is transcript through Google Unicode editor. Praat is used for corpus labeling and annotation. The total size of the SCEHMA corpus is 33690 isolated words and 19160 continuous sentences. The corpus will be made available to the scientific community for agricultural, polyclinic, medical, accent recognition, age group identification, gender recognition, and general-purpose recognition system after the transcription and annotation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call