Software is data too

Andrian Marcus

doi:10.1145/1862372.1862374

Abstract

Software systems are designed and engineered to process data. However, software is data too. The size and variety of today's software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. Software evolution is no longer just about writing code, it is becoming an information management problem.Analysis and management of the software data are activities that software engineers are not trained to do. We have to look for solutions outside software engineering, adopt them, and make them our own. These solutions can come from data mining, information retrieval, machine learning, statistical analysis, etc. This is not the first time software engineers are looking at such solutions. It has been going on for about two decades, in a form or another. The results so far indicate that software engineering is facing a paradigm shift, where more and more software engineering tasks are reinterpreted as optimization, search, retrieval, or classification problems. Despite this experience, applications of data analysis, data integration, and data mining in software engineering are in their infancy by comparison with other research fields. New research is needed to adapt existing algorithms and tools for software engineering data and processes, and new ones will have to be created. This research has to be supported by integration with software development processes and with education as well. More than that, in order for this type of research to succeed, it should be supported with new approaches to empirical work, where data and results are shared globally among researchers and practitioners.The talk will focus on arguing for and mapping out (part of) this research agenda, while looking back at (some of) the existing work in the area.

Full Text