Optimizing metric access methods for querying and mining complex data types

Jessica Andressa De Souza,Humberto Luiz Razente,Maria Camila N Barioni

doi:10.1186/s13173-014-0017-5

Jessica Andressa De Souza, Humberto Luiz Razente + Show 1 more

Open Access

https://doi.org/10.1186/s13173-014-0017-5

Copy DOI

Abstract

Abstract Background There are several application scenarios that can take advantage from the efficient processing of similarity operations in complex data types, such as multimedia data. Among them, it is possible to mention the execution of more complex query types (e.g., similarity queries) and several well-known data mining algorithms (e.g., data clustering) that are directly based on similarity computations. In order to speed up the similarity-based comparisons performed by these approaches, it is possible to store the dataset in specialized data structures known as metric access methods (MAM). Methods In this article we present four node split policies that can be employed in the construction of M-tree, the pioneer dynamic MAM, and of Slim-tree, the M-tree successor. Results These policies allow faster tree construction, as they result in better distribution of elements on the tree nodes and require less distance calculations when compared with the previously proposed ones. Furthermore, trees built with these policies have shown to be more efficient for techniques that require similarity computations, such as nearest neighbors queries and data clustering algorithms. Conclusion The experimental results show that trees built with the proposed policies outperform those built with the original ones with regard to the number of disk accesses, the amount of distance calculations, and the time required to run the queries.

Highlights

There are several application scenarios that can take advantage from the efficient processing of similarity operations in complex data types, such as multimedia data
In order to speed up the similarity-based comparisons performed by these approaches, it is possible to store the dataset in specialized data structures known as metric access methods (MAM)
The three sets of experiments presented were designed with the intent to evaluate MAM built with the four node split policies regarding both querying and mining complex datasets

Summary

Introduction

There are several application scenarios that can take advantage from the efficient processing of similarity operations in complex data types, such as multimedia data. It is possible to mention the execution of more complex query types (e.g., similarity queries) and several well-known data mining algorithms (e.g., data clustering) that are directly based on similarity computations. In order to speed up the similarity-based comparisons performed by these approaches, it is possible to store the dataset in specialized data structures known as metric access methods (MAM). As many data mining approaches, such as data clustering, are based on similarity comparisons, they can greatly benefit from the efficient processing of these operations in a MAM

Methods

Results

Conclusion