Abstract

Acquiring relevant business concepts is a crucial first step for any software project for which the software experts are not domain experts. The wealth of information buried within an organization's written documentation is a precious source of concepts, relationships and attributes which can be used to model the enterprise's domain. The lack of targeted extraction tools can make perusing through this type of resource a lengthy and costly process. We propose a domain model focused extraction process aimed at the rapid discovery of knowledge relevant to the software expert. To avoid undesirable noise from high-level linguistic tools, the process is mainly composed of positive and negative base filters that are less error prone and more robust. The extracted candidates are then reordered using a weight propagation algorithm based on structural hints from source documents. When tested on French text corpora from public organizations, our process performs 2.7 times better than a statistical baseline for relevant concept discovery. A new metric to assess the performance discovery speed of relevant concepts is introduced. The annotation of a gold standard definition of software engineering oriented concepts for knowledge extraction tasks is also presented.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.