HCube: Routing and similarity search in Data Centers

R.S Villaça,R Pasquini,L.B De Paula,M.F Magalhães

doi:10.1016/j.jnca.2014.08.012

Abstract

The current Big Data scenario is mainly characterized by the huge amount of data available on the Internet. Some deployed mechanisms for handling such raw data rely on Data Centres (DCs) based on massive storage, memory and processing capacity, in which solutions like BigTable, MapReduce and Dynamo process information in order to provide its retrieval. The HCube presents a DC alternative for data storage/retrieval based on the similarity search, in which similar content is concentrated on servers physically close within the HCube, simplifying the recovery of similar data. A similarity search is performed using a primitive get(k,sim), in which k represents the reference content and sim a similarity threshold. The HCube network is organized in a three dimensional structure, in which the Gray Space Filling Curve (SFC) in conjunction with the Random Hyperplane Hashing (RHH) function and the XOR-based flat routing mechanism offer an efficient and powerful mechanism for the similarity search. In this context, this work presents the HCube networking solution, detailing the benefits of using the Gray SFC and the XOR-based flat routing mechanism for the similarity search.

Full Text