Enhancing the results of data clustering for hybrid (numeric and categorical) and missing data is paramount for robust pattern recognition and decision-making in various fields. Traditional clustering algorithms often need help with heterogeneous data types and incomplete information, leading to suboptimal groupings and potentially biased insights. By addressing these challenges, advanced techniques, such as bioinspired algorithms, can provide more accurate and comprehensive clustering. We show the usefulness of using internal validity indices in obtaining high-quality clusters for hybrid and incomplete data. In addition, this work analyzes the influence of applying the recently proposed PAntSA on the performance of hybrid and incomplete clustering algorithms. To do this, the results of different algorithms are compared before and after applying PAntSA. The statistical analysis of the results provides experimental evidence that supports that the PAntSA algorithm improves the quality of the groups obtained by hybrid and incomplete data clustering methods.
Read full abstract