Abstract

There is a massive improvement in the computer technology which leads to infinite number of resources in all over the world stored in the computer. Computing devices have several uses and are necessary for businesses, scientists, governments, engineers. These have to generate data that comes from anywhere. Sensors gathering climate data, a person posting to a social media site, or a cell phone. Furthermore, the data itself may be too large to store on a single machine. In order to reduce the time it takes to process the data, and to have the storage space to store the data, we introduce a technique called map reduce programming model. In this programming method, it has to divide the workload among computers in a network. As a result, the performance of Map Reduce strongly depends on how evenly it distributes this workload among the computer. In Map Reduce, workload distribution depends on the algorithm that partitions the data. To avoid the problems of uneven distribution of data we use data sampling. By using the partitioning mechanism, the partitioned distributes the data depends on how large and representative the sample is and on how well the samples are analyzed. Due to this improves load balancing and memory consumption of the computers. In addition to that we use micro-partitioning methods to divide the workload into small tasks that are dynamically scheduled at runtime. This approach is only effective in systems with high-throughput, low-latency task schedulers and efficient data materialization. To enhance the accuracy in scheduling we propose an innovative method called Map Reduce Task Scheduling algorithm for Deadline constraints. This method allows user specify a job's deadline and tries to make the job be finished before the deadline. Through measuring the node's computing capacity, a node classification algorithm is proposed in MTSD. This algorithm classifies the nodes into several levels in heterogeneous clusters. Under this algorithm, we firstly illuminate a novel data distribution model which distributes data according to the node's capacity level respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.