Emerging directions in predictive text mining

Nitin Indurkhya

doi:10.1002/widm.1154

Abstract

In recent years, Text Mining has seen a tremendous spurt of growth as data scientists focus their attention on analyzing unstructured data. The main drivers for this growth have been big data as well as complex applications where the information in the text is often combined with other kinds of information in building predictive models. These applications require highly efficient and scalable algorithms to meet the overall performance demands. In this context, six main directions are identified where research in text mining is heading: Deep Learning, Topic Models, Graphical Modeling, Summarization, Sentiment Analysis, Learning from Unlabeled Text. Each direction has its own motivations and goals. There is some overlap of concepts because of the common themes of text and prediction. The predictive models involved are typically ones that involve meta‐information or tags that could be added to the text. These tags can then be used in other text processing tasks such as information extraction. While the boundary between the fields of Text Mining and Natural Language Processing is becoming increasingly blurry, the importance of predictive models for various applications involving text means there is still substantial growth potential within the traditional sub‐fields of text mining. These data‐centric directions are also likely to influence future research in Natural Language Processing, especially in resource‐poor languages and in multilingual texts. WIREs Data Mining Knowl Discov 2015, 5:155–164. doi: 10.1002/widm.1154This article is categorized under: Algorithmic Development > Text Mining Technologies > Prediction Technologies > Structure Discovery and Clustering

Full Text