The classification tasks for numerical or categorical data have been well developed. However, the data collected in the real world are frequently the mixed type containing numerical and categorical values, and how to classify the mixed data quickly and efficiently is a critical yet challenging task. Existing classification models for mixed data usually treat the mixed data processing and subsequent classification as two independent phases, without considering their compatibility. By fusing the mixed data processing into a classification algorithm, this paper proposes an extended version of RBF-ELM (Radial Basis Function-Extreme Learning Machine), a Mixed Data RBF-ELM method (MD-RBF-ELM for short), which can achieve direct, fast, and efficient classification for mixed data. Specifically, a distance metric method for mixed data is firstly designed to calculate the distances between the input data and the RBF centers, and then these distances are used to train the network structure and weights of MD-RBF-ELM, thereby realizing the fusion of data processing with model learning. In addition, to alleviate the problem of MD-RBF-ELM’s unstable performance caused by randomly selecting the RBF centers, we propose an improved density peak clustering algorithm and use it to select the optimal RBF centers automatically and adaptively. Extensive experimental results on 34 data sets demonstrate that MD-RBF-ELM significantly enhances the classification performance (increasing 2.37% for F1-score, up to 14/34 for the number of best results, and reaching 2.4/8 for the averaged ranks), compared with seven state-of-the-art competitors.
Read full abstract