Abstract

In the real world practice, software systems are often built without developing any explicit upfront model. This can cause serious problems that may hinder the almost inevitable future evolution, since at best the only documentation about the software is in the form of source code comments. To address this problem, research has been focusing on automatic inference of models by applying machine learning algorithms to execution logs. However, the logs generated by a real software system may be very large and the inference algorithm can exceed the processing capacity of a single computer. This paper proposes a scalable, general approach to the inference of behavior models that can handle large execution logs via parallel and distributed algorithms implemented using the MapReduce programming model and executed on a cluster of interconnected execution nodes. The approach consists of two distributed phases that perform trace slicing and model synthesis . For each phase, a distributed algorithm using MapReduce is developed. With the parallel data processing capacity of MapReduce, the problem of inferring behavior models from large logs can be efficiently solved. The technique is implemented on top of Hadoop. Experiments on Amazon clusters show efficiency and scalability of our approach. • A distributed trace slicing algorithm using MapReduce. • A distributed model synthesis algorithm using MapReduce. • A novel approach for inferring software behavior models with MapReduce. • Experimental results show promising performance of this approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call