Abstract

Speech is the basic way of interaction between the listener to the speaker by voice or expression. Humans can easily understand the speakers' message, but machines can't understand the speaker's word. Nowadays, most of our lives are occupied by machines; but we can't interact with machines. The human brain, like machine learning technology, is essential for speech recognition to interact with machines to humans. The language used for speech recognition must be a global language, so English has been used in this paper. The machine learning methodology is used in a lot of assignments through the feature learning capability. The data modelling capability results attained supplementary than the performance of normal learning methodology. So, in this work, the speech signal recognition is based on a machine-learning algorithm to merge the speech features and attributes. As a result of voice as a bio-metric implication, the speech signal is converted into a significant element of speech improvement. A new speech and emotion recognition technology is introduced. In this paper, discriminated speaking technology are spotlighted on the feature extraction, improvement, segmentation and progression of speech emotion recognition. Initially, the trained RNN layer-based feature extraction is done to get the speech signal's high-level features. From the generated high-level features are used for generating the new speech feature for the capsule network. Finally, the obtained speech features and attribute features are combined into the same RNN with Caps Net framework through the fully connected network. The experimental result shows the improved proposed speech recognition algorithms accuracy with another state-of-the-art method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call