Abstract

In this paper, we try to represent the importance of data locality with the HBase architecture. HBase has a dynamic master slave architecture but the emphasis on data locality, i.e. getting the logic or processing near to the data is the major phenomenon followed for better and efficient performance. Data Locality is valid as every region server has the information of every data blocks located in respective regions but what if the region server crashes or the region server is restarted or the regions are randomly re-distributed with all the region servers due to load balancing, then data locality is completely lost during that time. Performance is majorly affected if there is misconfiguration of data locality in the cluster. The HMaster uses [4] .META table to get information about the region server that has its specified regions containing rows. Keeping an eye on this disadvantages and challenges, we propose to improvise data locality by allocating maximum regions to that region server which had the maximum data blocks of that region in it. An algorithm is proposed based on HRegion locality index for deciding the criteria of allocating the regions to region servers for maintaining data locality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.