Abstract

In today’s world data is produced every day at a phenomenal rate and we are required to store this ever growing data on almost daily basis. Even though our ability to store this huge data has grown but the problem lies when users expect sophisticated information from this data. This can be achieved by uncovering the hidden information from the raw data, which is the purpose of data mining. Data mining or knowledge discovery is the computer-assisted process of digging through and analyzing enormous set of data and then extracting the meaning out of it. The raw and unlabeled data present in large databases can be classified initially in an unsupervised manner by making use of cluster analysis. Clustering analysis is the process of finding the groups of objects such that the objects in a group will be similar to one another and dissimilar from the objects in other groups. These groups are known as clusters. In other words, clustering is the process of organizing the data objects in groups whose members have some similarity among them. Some of the applications of clustering are in marketing -finding group of customers with similar behavior, biology- classification of plants and animals given their features, data analysis, and earthquake study -observe earthquake epicenter to identify dangerous zones, WWW -document classification, etc. The results or outcome and efficiency of clustering process is generally identified though various clustering algorithms. The aim of this research paper is to compare two important clustering algorithms namely centroid based K-means and X-means. The performance of the algorithms is evaluated in different program execution on the same input dataset. The performance of these algorithms is analyzed and compared on the basis of quality of clustering outputs, number of iterations and cut-off factors.

Highlights

  • For any software product, maintainability is considered to be one of the most important phase of the software development life cycle

  • Maintainability is treated as a quality attribute because the cost of the work required to perform the maintenance activities on any software product constitutes for the largest cost in present scenario of software development

  • The ISO/IEC 9126 [1] standards defines maintainability as the capacity of the software product to be modified, including corrections, improvements or adaptations of software to change in environment and in requirements and functional specifications

Read more

Summary

INTRODUCTION

Maintainability is considered to be one of the most important phase of the software development life cycle. The advantage of the data mining technology is that it can deal with large amount data of data efficiently and can extract the useful information out of it This attribute of data mining has been helpful in improving the software maintenance [4]-[6]. Technique in general applications such as image processing, data analysis, marketing, WWW, etc; it is used to evaluate and improve the maintainability of the software system. For this purpose we have various clustering algorithms.

CLUSTERING
Partitioning Methods
Hierarchical methods
Grid-based methods
Model-based methods
Clustering high-dimensional data
Constraint-based clustering
K-MEANS
The K-means Algorithm
X-MEANS
The X-means Algorithm
EXPERIMENTAL DISCUSSIONS
Implementation of X-means on QUES data set
Comparison of K-means and X-means
CONCLUSION AND FUTURE SCOPE

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.