Abstract

The Web contains a massive amount of information embedded in text and obtaining information from Web text is a major research challenge. One research focus is Open Information Extraction aimed at developing relation-independent information extraction. Open Information Extraction (OIE) systems seek to extract all potential relations from the text rather than extracting a few pre-defined relations. Existing OIE systems such as TEXTRUNNER usually take a machine learning based approach which requires large volumes of training data. This paper presents a Ripple-Down Rules Open Information Extraction system based on processing example cases and manually adding rules when needed. The key advantages of this approach are that it can handle the freer writing style that occurs in Web documents and can correct errors introduced by natural language pre-processing tools, whereas systems like TEXTRUNNER depend on the quality of the entity-tagging preprocessing in the training data. We evaluated the Ripple-Down Rules approach against the OIE systems, TEXTRUNNER and StatSnowball. In these studies the Ripple-Down Rules approach, with minimal low-cost rule addition achieves much higher precision and somewhat improved recall compared to these other Open Information Extraction systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.