Abstract

Map-reduce is a programming model popularized by Google since 2004. It's used with large-scale datasets and processing data on a shared-nothing cluster. Map-Reduce accomplish high performance by partitioning the processes into small units of work that can run in parallel across thousands of nodes in the cluster. Rapidly, increasing in data size has risen importance to uncover hidden pattern to acquire new knowledge and get valuable information. But, map-reduce doesn't directly support join operation. This paper discusses some types of two-way algorithms, list some advantage and disadvantage of every algorithms. We propose a new multi — way join algorithm hash semi cascade join used to join more than two data sets. Using hash tables in the first phase, deleting unused records for joint operation as early as possible to reduce network bottleneck and increase performance. We compare this new algorithm with some types of multi-way join like map side join, reduce side one shot join and reduce side cascade join. Our experimental results show that the map side join has more time for sorting data and do join result with small data sets with high performance but, time increase while data are increased. Reduce side one shot join has join result near map side join. Reduce side cascade join get more time to get the final result. Hash semi cascade join gain high performance using hash tables. According to, reduce shuffling records as in reduce side one shot and reduce side cascade join it can do join for any data set size. As well, using a hash table doesn't effect in memory size.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.