Efficient Text Normalization via Hybrid Bi-directional LSTM

Divya Divya,Santosh Singh Rathore,Sunil Kumar,Debanjan Sadhya

doi:10.1109/ibssc53889.2021.9673310

Abstract

Text normalization is the process of mapping non-sanctioned text to a standardized format in order to extract some meaningful inferences from it. Some important examples include computer interceded communication text like email, and instant messaging. Internet-based communication offers a generous amount of important crude data which require text normalization for further processing. Nonetheless, casual writing can immediately turn into a bottleneck for some natural language processing tasks. Off-the-rack tools are generally trained on proper content and cannot deal with the deformity in texts in an explicit manner. Current text normalization frameworks depend on phonetic closeness and arrangement models that work in a local fashion. Noticeably, handling contextual information is essential for this task. In this work, we propose a hybridized system of deep learning model that can viably fill in as a preprocessing step for NLP applications to adjust noisy content in texts. Empirical results demonstrate that the proposed encoder-decoder model based on bi-directional LSTM outperforms other text normalization mechanisms in terms of the model accuracy.

Full Text