The performance of students is a vital and significant element of institutions of higher learning, for both the student and the academic community as a whole. As a result, higher education institutions must be more flexible in terms of performance metrics and ideas. However, when the student population grows owing to sessional admission, obtaining accurate information on students' performance becomes more difficult due to the huge amount of data in educational databases (for about 1-100years). Clustering is one of the data mining techniques used to examine large amounts of data. It organizes data into clusters so that items are placed together in the same cluster if they are comparable based on certain criteria. Several methods for improving the performance of the K-means clustering algorithm used in big data analysis have been proposed in the literature, but the proposed modified K-means clustering algorithm is less time-consuming, more efficient, has less complexity, and, most importantly, produces better clustering. To categorize numerical data, the modified K mean method is employed. However, the data in each cluster may be susceptible to outliers and noisy data, which may decrease the accuracy rate of data matching, since pattern matching will not readily enable prediction of the cluster center and therefore cannot characterize the data in the cluster. The modified k-means clustering method, which is suitable for large data from social media, sensors, search engines, GPS, transaction/financial records, satellites, e-commerce sites, and other sources, is suggested to address the issue and assess the results produced.
Read full abstract