Abstract

Software development is still considered a bottleneck for Small and Medium Enterprises (SMEs) in the advance of the Information Society. Usually, SMEs store and collect a large number of software textual documentation; these documents might be profitably used to facilitate them in using (and re-using) Software Engineering methods for systematically designing their applications, thus reducing software development cost. Specific and semantics textual filtering/search mechanisms, supporting the identification of adequate processes and practices for the enterprise needs, are fundamental in this context. To this aim, we present an automatic document retrieval method based on semantic similarity and Word Sense Disambiguation techniques. The proposal leverages on the strengths of both classic information retrieval and knowledge-based techniques, exploiting syntactical and semantic information provided by general and specific domain knowledge sources. For any SME, it is as easily and generally applicable as are the search techniques offered by common enterprise Content Management Systems. Our method was developed within the FACIT-SME European FP-7 project, whose aim is to facilitate the diffusion of Software Engineering methods and best practices among SMEs. As shown by a detailed experimental evaluation, the achieved effectiveness goes well beyond typical retrieval solutions.

Highlights

  • Introduction and MotivationsOne of the main bottlenecks for the development of the Information Society (Aetic (Spain) and Agoria (Belgium) and AssInform (Italy) et al, 24 October 2008)Received xxx Revised xxx Accepted xxx hes been software development, as the quality and productivity of work has not been able to keep up with the society software needs (DG INFSO Internal Reflection Group on Software Technologies, ITEA, April 2002; Standish Group, 2006)

  • According to the analysis performed by the INNOSme project (InnoSME Project, 2008) across several countries, these issues are especially critical for software SMEs (Small and Medium Enterprises): the available resources cannot be devoted to new technology training as they are absorbed in the activity of software production

  • The limitations of standard syntactical techniques are overcome by considering the semantics intrinsically associated to the document/query terms and by addressing the problem of term ambiguity through the use of Word Sense Disambiguation (WSD)

Read more

Summary

Introduction and Motivations

One of the main bottlenecks for the development of the Information Society (Aetic (Spain) and Agoria (Belgium) and AssInform (Italy) et al, 24 October 2008). SMEs store and collect a large number of software textual documentation: text is everywhere and even test cases and inline comments could be useful knowledge sources (Lethbridge, Singer and Forward, 2003) This textual information might be profitably used to facilitate them in using (and re-using) Software Engineering methods for developing their applications; their inadequate information systems often prevents them from doing so (Garg, Goyal and Lather, 2010). The limitations of standard syntactical techniques (such as the ones usually exploited by enterprise CMSs) are overcome by considering the semantics intrinsically associated to the document/query terms and by addressing the problem of term ambiguity through the use of Word Sense Disambiguation (WSD) To this aim, we exploit different kinds of external knowledge sources (both general and specific domain dictionaries or thesauri); 3.

FACIT-SME Solution and Semantic Helper Overview
Keyword Extraction and Enhancement
The Semantic Glossary
Semantic Similarity Computation
Beyond Term Ambiguities
Sense-Aware Extensions to the Keyword Extraction and Enhancement
Sense-Aware Semantic Similarity
Sense-Aware Techniques in Practice
Related Work
Experimental Evaluation
Effectiveness of All-Senses Techniques
Impact on Effectiveness of Sense-Aware Techniques
Detailed Ranking Effectiveness Evaluation
Concluding Remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.