Abstract

Abstract Background There are several application scenarios that can take advantage from the efficient processing of similarity operations in complex data types, such as multimedia data. Among them, it is possible to mention the execution of more complex query types (e.g., similarity queries) and several well-known data mining algorithms (e.g., data clustering) that are directly based on similarity computations. In order to speed up the similarity-based comparisons performed by these approaches, it is possible to store the dataset in specialized data structures known as metric access methods (MAM). Methods In this article we present four node split policies that can be employed in the construction of M-tree, the pioneer dynamic MAM, and of Slim-tree, the M-tree successor. Results These policies allow faster tree construction, as they result in better distribution of elements on the tree nodes and require less distance calculations when compared with the previously proposed ones. Furthermore, trees built with these policies have shown to be more efficient for techniques that require similarity computations, such as nearest neighbors queries and data clustering algorithms. Conclusion The experimental results show that trees built with the proposed policies outperform those built with the original ones with regard to the number of disk accesses, the amount of distance calculations, and the time required to run the queries.

Highlights

  • There are several application scenarios that can take advantage from the efficient processing of similarity operations in complex data types, such as multimedia data

  • In order to speed up the similarity-based comparisons performed by these approaches, it is possible to store the dataset in specialized data structures known as metric access methods (MAM)

  • The three sets of experiments presented were designed with the intent to evaluate MAM built with the four node split policies regarding both querying and mining complex datasets

Read more

Summary

Introduction

There are several application scenarios that can take advantage from the efficient processing of similarity operations in complex data types, such as multimedia data. It is possible to mention the execution of more complex query types (e.g., similarity queries) and several well-known data mining algorithms (e.g., data clustering) that are directly based on similarity computations. In order to speed up the similarity-based comparisons performed by these approaches, it is possible to store the dataset in specialized data structures known as metric access methods (MAM). As many data mining approaches, such as data clustering, are based on similarity comparisons, they can greatly benefit from the efficient processing of these operations in a MAM

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call