Abstract

. source control systems store changes to the source code as development progresses, . defect tracking systems follow the resolution of software defects, and . archived communications between project personnel record rationale for decisions throughout the life of a project. The data in these repositories is available for most large software projects and represents a detailed and rich record of the historical development of software systems. Until recently these repositories were used primarily for their intended activities such as maintaining versions of the source code or tracking the status of a defect. Software practitioners and researchers are beginning to recognize the potential benefit of mining this information for other purposes. Research is now proceeding to uncover the ways in which mining these repositories can help support the maintenance of software systems, improve software design/reuse, and empirically validate novel ideas and techniques. Unfortunately, software repositories are not designed to facilitate empirical understanding of a software project. Software tools (such as version control and defect tracking systems) tend to have various anomalies and issues in their recorded information. Tools may be used differently in different projects, and different tools are often used in different organizations. More importantly, the quantities of interest such as development effort are usually not directly captured. A key challenge is to build useful theories and models of software development that can be empirically validated using the information in software repositories. There has been progress in overcoming some of these challenges by constructing tools and developing methods to extract, clean, and validate information from software repositories. A workshop on the topic of mining software repositories (MSR 2004: http://msr.uwaterloo.ca) was held on 25 May 2004 in Edinburgh, UK, in conjunction with the 26th IEEE International Conference on Software Engineering (ICSE). A following workshop (MSR 2005) was held on 17 May 2005 in Saint Louis, Missouri, in conjunction with the 27th ICSE. Each workshop had submissions from over 14 countries. Both workshops had over 50 attendees, making the MSR workshop series the most attended ICSE workshop for the last two years. These workshops have established a community of researchers and practitioners who are working to recover and use the data stored in software repositories to further understanding of software development practices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call