Mining Unstructured Data in Software Repositories: Current and Future Trends

Gabriele Bavota

doi:10.1109/saner.2016.47

Abstract

The amount of unstructured data available to software engineering researchers in versioning systems, issue trackers, achieved communications, and many other repositories is continuously growing over time. The mining of such data represents an unprecedented opportunity for researchers to investigate new research questions and to build a new generation of recommender systems supporting development and maintenance activities. This paper describes works on the application of Mining Unstructured Data (MUD) in software engineering. The paper briefly reviews the types of unstructured data available to researchers providing pointers to basic mining techniques to exploit them. Then, an overview of the existing applications of MUD in software engineering is provided with a specific focus on textual data present in software repositories and code components. The paper also discusses perils the miner should avoid while mining unstructured data and lists possible future trends for the field.

Full Text