Abstract

While the Hadoop MapReduce paradigm offers a linearly scalable approach to solving many complex problems, it does not work for every problem type. General examples of problems that can and cannot be solved with MapReduce have been discussed in a number of sources but the requirements for effective use of MapReduce are not clear. This paper takes the approach that it is not the problem type but the characteristics of the algorithm and the data that must be understood to implement a solution in MapReduce. The paper examines the MapReduce paradigm and derives the key requirements and constraints it implies. These requirements and constraints are stated as a set of rules than can be applied to various problem solutions such that the algorithm and data specifications make effective use of MapReduce. These characteristics can also provide guidance to refactoring incompatible algorithms so that they may be used effectively under MapReduce. Examples of using these characteristics to create effective MapReduce solutions in the healthcare analytics space will be used to illustrate the concepts presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call