High dimensional nearest neighbor search considering outliers based on fuzzy membership

Gend Lal Prajapati,Rupali Bhartiya

doi:10.1109/sai.2017.8252127

Abstract

Most of the data in real world are high dimensional. These high dimensional data are used in many practical applications including medical image processing, geographical information system, and pattern recognition. Nearest Neighbor search is one of the important methods for finding data matching with the query. As these data are very large, it is very complex to store and use for searching. Multiple dimensions are used for getting the accurate result. Several methods are available for searching in high dimension but most of the methods use the concept of dimension reduction. Available exact Nearest Neighbor search algorithms are not efficient in search time, storage space, or find it efficient only for small number of dimensions. Such type of problems in high dimension data is called curse of dimensionality. The elimination of unimportant dimension is not always appropriate for finding the exact search results like Nearest Neighbor search for health monitoring, fault detection or intrusion detection etc. This is due to risk of elimination of important data which might be useful for search. High dimensional data require efficient search result as well as efficiency with respect to time and space. Outlier removal is also a type of elimination from data. In this paper, we propose an algorithm for Nearest Neighbor search in high dimensions for solving the problem of uneven and rigid clustering by removing outliers without losing data. This is done by clustering of data considering all dimensions and using outlier data for search as well as smoothly clustered data. Proposed algorithm uses advantage of K-mean as well as Fuzzy C-Mean (FCM) algorithms for clustering. Indexing of cluster centroids and outlier membership is done for better search results on time. This algorithm finds the exact search result in less time as well as generates better data clusters. The algorithm is implemented in MATLAB and JAVA. Result shows that proposed algorithm generates better clusters, considers useful data of outliers, and produces Nearest Neighbor search in less time and accurate as compared with K-mean clustering.

Full Text