The neoteric introduction of 5G technology in mobile internet is transforming Internet of mobile Things (IoMT) massively by addressing low latency, support for a large number of IoMT devices, and less power consumption, thereby delivering cost-effective solutions to low-end devices. This transformational technology enables a ubiquitous connected critical communication network between the healthcare system and IoT, as they largely depend on low-end devices for gathering data at the point of care. Gathering and interpolation of data from things are moved over to the cloud, making the extraction of knowledge and decision-making capabilities more robust. Vocal signals form the basis of communication between human beings with the transfer of complex data with variations in thrust, pitch, and tones. The representation and recognition of these analog signals by digital systems prove to be quite exciting and challenging. The spoken language models are converted to digital signals to be identifiable based on different cues like phonetic, prosodic, phonotactic, and lexical features. Voice patterns tend to be specific for every individual with a slight orientation towards the language spoken by the individual of a particular region. While speech patterns tend to alter the meaning of words with tones, high and low pitches in the utterance of the words, NLP tends to learn specific associations of words through vectors. The focus on learned networks in solving the problem of speech synthesis to text with minimal loss and high predictability of syllable of word, sentence, and paraphrase is needed. The creation of a knowledge base corpus of learned variable prosody of features helps in the learnability of interestingness directly without any perturbations. The learning algorithms to realize the degree of understandability of speech with the word, sentence identified, and transcription with substantial noise interference. The transfer of the acoustic features learned by algorithms proves to be quiet challenging as they are distorted by sudden environmental changes. Syllable extracted from the speech translation may or may not represent the Sentiment of the word, with different phonetical modulation. Utilization of the MobileNets and DistillBERT to transfer the language extraction and the edge reducing the time of processing and reducing the corpus of the size, reducing the adversial learning of the voice features and the patterns, reducing the Transfer of learned corpus and patterns.
Read full abstract