Abstract

As the volume of data is increasing with time the primary issue is how to store and process such data and get useful information out of it. Analysis of classification algorithms and MapReduce programming model has led to the conclusion that the distributed file system and parallel computing attributes of MapReduce are good for designing classifier model. The major reason for it is parallel processing of data in which data is divided and processed in parallel and the output from each is reduced further for a single output. In this paper, we are going to study how to use MapReduce model to build classifier model. We are using cancer dataset to predict if a person has cancer or not by using Naive Bayes and KNN classification algorithms. We have compared them on the basis on computational time and the factors like sensitivity, specificity, and accuracy. In the end, we would be able to compare these two algorithms and tell which one works better on MapReduce programming model

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.