A Grid-Based Scalable Classifier for High Dimensional Datasets

Sheetal Saini,Sumeet Dua

doi:10.1007/978-3-642-12035-0_42

Abstract

AbstractHigh dimensionality and large dataset size are two common characteristics of real-world datasets and databases. These characteristics pose unique challenges for the classification of such datasets. The classification algorithms that perform well (in terms of scalability and efficiency) on small and medium datasets with moderate dimensionality fail to scale well with the large and high dimensional datasets. Therefore, in this paper, we propose a scalable classifier to cope with large and high dimensional datasets. The proposed method inherits its scalability feature from the concept of grid-based partitioning. Our goals in using this method are to divide the data space into small partitions called cells and to map the data on the partitioned data space. Thus, instead of managing the individual data points within the data, abstract entities called cells are used to decrease the classification runtime for large and high dimensional datasets. The presented experimental results demonstrate the scalability and efficiency of our algorithm.KeywordsClassificationscalabilitygrid-based

Full Text