Abstract

Recent years have seen a dramatic growth of natural language text data (e.g., web pages, news articles, scientific literature, emails, enterprise documents, blog articles, forum posts, product reviews, and tweets). Text data contain all kinds of knowledge about the world and human opinions and preferences, thus offering great opportunities for analyzing and mining vast amounts of text data (“big text data”) to support user tasks and optimize decision making in all application domains. However, computers cannot yet accurately understand unrestricted natural language; as such, involving humans in the loop of interactive text mining is essential. In this talk, I will present the vision of TextScope, an interactive software tool to enable users to perform intelligent information retrieval and text analysis in a unified task-support framework. Just as a microscope allows us to see things in the imicro world,i and a telescope allows us to see things far away, the envisioned TextScope would allow us to iseei useful hidden knowledge buried in large amounts of text data that would otherwise be unknown to us. As examples of techniques that can be used to build a TextScope, I will present some general statistical text mining algorithms that we have recently developed for joint analysis of text and non-text data to discover interesting patterns and knowledge. I will conclude the talk with a discussion of the major challenges in developing a TextScope and some important directions for future research in text data mining.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.