Abstract
The real-world big data are largely unstructured, interconnected, and in the form of natural language text. One of the grand challenges is to turn such massive unstructured data into structured ones, and then to structured networks and actionable knowledge. We propose a data-intensive text mining approach that requires only distant supervision or minimal supervision but relies on massive data. We show quality phrases can be mined from such massive text data, types can be extracted from massive text data with distant supervision, and relationships among entities can be discovered by meta-path guided network embedding. Finally, we propose a D2N2K (i.e., data-to-network-to-knowledge) paradigm, that is, first turn data into relatively structured information networks, and then mine such text-rich and structure-rich networks to generate useful knowledge. We show such a paradigm represents a promising direction at turning massive text data into structured networks and useful knowledge.
Paper version not known (
Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have