Abstract
Machine learning is practical for software engineering problems, even in datastarved domains. When data is scarce, knowledge can be farmed from seeds; i.e. minimal and partial descriptions of a domain. These seeds can be grown into large datasets via Monte Carlo simulations. The datasets can then be harvested using machine learning techniques. Examples of this knowledge farming approach, and the associated technique of data-mining, is given from numerous software engineering domains. Machine learning (ML) is not hard. Machine learners automatically generate summaries of data or existing systems in a smaller form. Software engineers can use machine learners to simplify systems development. This chapter explains how to use ML to assist in the construction of systems that support classification, prediction, diagnosis, planning, monitoring, requirements engineering, validation, and maintenance. This chapter approaches machine learning with three specific biases. First, we will explore machine learning in data-starved domains. Machine learning is typically proposed for domains that contain large datasets. Our experience strongly suggests that many domains lack such large datasets. This lack of data is particularly acute for newer, smaller software companies. Such companies lack the resources to collect and maintain such data. Also, they have not been developing products long enough to collect an appropriately large dataset. When we cannot mine data, we show how to farm knowledge by growing datasets from domain models. Second, we will only report mature machine learning methods; i.e. those methods which do not require highly specialized skills to execute. This second bias rules out some of the more exciting work on the leading edge of machine learning research (e.g. horn-clause learning). Third, in the author’s view, it has yet to be shown empirically from realistic examples that a particular learning technique is necessarily better than the others . When faced with arguably equivalent techniques, Occam’s razor suggests we use the simplest. We hence will explore simple decision tree learners in this chapter. Decision tree learners execute very quickly and are widely used: many of the practical SE applications of machine learning use decision tree learners like C4.5 [33] or the CART For evidence of this statement, see the comparisons of different learning methods in [34, 17, 36]
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have