Abstract

Various information extraction (IE) systems for corporate usage exist. However, none of them target the product development and/or customer service domain, despite significant application potentials and benefits. This domain also poses new scientific challenges, such as the lack of external knowledge resources, and irregularities like ungrammatical constructs in textual data, which compromise successful information extraction. To address these issues, we describe the development of Textractor; an application for accurately extracting relevant concepts from irregular textual narratives in datasets of product development and/or customer service organizations. The extracted information can subsequently be fed to a host of business intelligence activities. We present novel algorithms, combining both statistical and linguistic approaches, for the accurate discovery of relevant domain concepts from highly irregular/ungrammatical texts. Evaluations on real-life corporate data revealed that Textractor extracts domain concepts, realized as single or multi-word terms in ungrammatical texts, with high precision.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.