Abstract

Software engineers are able to measure the quality of their code using a variety of metrics that can be derived directly from analyzing the source code. These internal quality metrics are valuable to engineers, but the organizations funding the software development effort find external quality metrics such as defect rates and time to develop features more valuable. Unfortunately, external quality metrics can only be calculated after costly software has been developed and deployed for end-users to utilize. Here, we present a method for mining data from freely available open source codebases written in Java to train a Random Forest classifier to predict which files are likely to be external quality hotspots based on their internal quality metrics with over 75% accuracy. We also used the trained model to predict hotspots for a Java project whose data was not used to train the classifier and achieved over 75% accuracy again, demonstrating the method’s general applicability to different projects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.