Abstract

The last few years have seen a rapid increase of sheer amount of data produced and communicated over the Internet and the Web. While it is widely believed that the availability of such ``Big Data'' holds the potential to revolutionize many aspects of our modern society (e.g., intelligent transportation, environmental monitoring, and energy saving), many challenges need to be addressed before this potential can be realized. This PhD project focuses on one critical challenge, namely extracting actionable knowledge from Big Data. Tremendous efforts have been contributed on mining large-scale data on the Web and constructing comprehensive knowledge bases (KBs). However, existing knowledge extraction systems retrieve data from limited types of Web sources. In addition, data fusion approaches consider very little of the noises produced by those knowledge extraction systems. Consequently, the constructed KBs are far from being comprehensive and accurate. In this paper, we present our initial design of a framework for extracting machine-readable data with high precision and recall from four types of data sources, namely Web texts, Document Object Model (DOM) trees, existing KBs, and query stream. Confidence scores are attached to the resulting knowledge, which can be used to further improve the knowledge fusion results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.