Abstract
Deploying a solution for handling critical decision-based problem efficiently requires the processing of high-dimensional data. Over the years, due to modern technological advancement, unprecedented volume of uncertain data is been captured and this has necessitated the need to organize such data for better data access performance. To this effect, the use of indexing technique for supporting, organizing, and storing of uncertain data with high dimensionality has become pertinent. However, the choice of an indexing technique to improve search performance is highly influenced by the properties of the underlying data set, data construction methods employed by the indexing structure, and the query types it supports. This paper is motivated to conduct an extensive performance analysis among existing indexing techniques, namely: R-tree, R*-tree and X-tree, in order to realize the most efficient indexing structure for organizing, storing and ultimately improving search performance over uncertain data with high dimensionality. The results of the analyses with regard to CPU processing time and number of nodes visited clearly show the superiority of X-tree over R-tree and R*-tree, as its superiority holds for different data set sizes, data distributions, number of dimensions and even with varying selectivity ratio.
Highlights
Modern technological advancements seen in computing sphere today, have been responsible for generating or capturing high-dimensional data sets with huge number of objects
The quest for processing high-dimensional data has resulted in a number of innovative indexing techniques which are found useful in many applications, like Geographical Information Systems (GIS), robotics, environmental protection, metric spaces, medical imaging, and geosciences, as they are geometrically suited for both point and spatial data [2], [3], [5]–[7], [9], [11], [12], [14], [16], [18], [24], [28], [30], [41], [43], [46]
The number of nodes visited by the R-tree structure remains very high for the synthetic data set while in real data set it remains slightly higher than that of R*-tree and X -tree structures. This is due to the optimization heuristics of minimizing margins which was employed by both X -tree and R*-tree indexing structures
Summary
Modern technological advancements seen in computing sphere today, have been responsible for generating or capturing high-dimensional data sets with huge number of objects. The choice of indexing structures for organizing large data set is accentuated by its ability to generally support both point and spatial data, where the spatial data structures require no transformation to improve data storage via better spatial clustering. This is further justified by the work of [12], where the authors were able to realize the performance of the index-based structures over the sorted-based methods when data objects are clustered together in an MBR for effective pruning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.