Evaluation of Expectation Maximization Based Clustering Approach for Reusability Prediction of Function based Software systems

Er Gurbhej Singh,Himani Goel

doi:10.5120/1308-1705

Abstract

ABSTARCT In this study Expectation Maximization based Clustering approach is evaluated for Reusability Prediction of Function based Software systems. Here, the metric based approach is used for prediction. The function oriented dataset considered have the output attribute as Reusability value. The Reusability in the dataset is expressed in terms of six numeric labels i.e. 1, 2, 3, 4, 5 and 6. The label 1 represents Nil and the label 6 represents the Excellent Reusability Label. A framework of metrics are used to target those the essential attributes of function oriented features towards measuring the reusability of software modules, so it tried to analyze, refine and use following metrics to explore different structural dimensions of Function oriented components: Cyclometric Complexity Using Mc Cabe’s Measure, Halstead Software Science Indicator, Regularity Metric, Reuse-Frequency Metric and Coupling Metric. The input attributes are expressed in the three linguistic labels i.e. 1, 2, and 3. The label 1 corresponds to the Low value, label 2 corresponds to the Medium value and label 3 corresponds to the high value.Five Input metrics are used as Input and clusters are formed using EM. EM assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters.Thereafter 10 fold cross validation performance of the system is recorded. The results are expressed in Precision, Recall and Accuracy values. Precision for a class is the number of true positives (i.e. the number of items correctly labeled as belonging to the positive class) divided by the total number of elements labeled as belonging to the positive class (i.e. the sum of true positives and false positives, which are items incorrectly labeled as belonging to the class). Recall is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives, which are items which were not labeled as belonging to the positive class but should have been). Hence, Precision can be seen as a measure of exactness or fidelity, whereas Recall is a measure of completeness. Accuracy is the percentage of the predicted values that match with the expected values of the reusability for the given data. As deduced from the results it is clear that Precision and Recall values of the sixth level reusability class is the maximum, it means the system is able to detect the “Excellent” components precisely. Similarly, Precision and Recall values of the fourth level reusability class is the second best, it means the system is able to detect the “Good” components with good Precision. The proposed technique is showing Accuracy value approximately equal to 60%, so it is satisfactory enough to use the Expectation maximization based clustering technique for the prediction of the function based reusable modules from the existing reservoir of software components. The proposed approach is applied on the C based software modules/components and it can further be extended to the Artificial Intelligence (AI) based software components e.g. Prolog Language based software components. It can also be tried to calculate the fault-tolerance of the software components with help of the proposed metric framework.

Full Text