Abstract

In the realm of distributed computing frameworks, such as Apache Spark and MapReduce Hadoop, the efficacy of these frameworks varies across diverse applications and algorithms contingent upon distinctive evaluation metrics and critical parameters. This research paper diligently scrutinizes the extant body of research that compares these two frameworks concerning said evaluation metrics and parameters. Subsequently, it conducts empirical investigations to authenticate the performance of these frameworks in the context of an iterative Gradient Boosting Tree Regression (GBTR) algorithm. Remarkably, the comparative analyses in previous studies encompass a spectrum of iterative machine learning regression and classification techniques, batch processing, SQL, and Graph processing algorithms. Furthermore, numerous investigations have explored the application of machine learning algorithms encompassing logistic regression, Page Rank, K-Means, KNN, and the HiBench suite. This paper presents the comparison between the two distributed computing platforms on iterative GBTR for classification task on the HIGGS dataset from the physics domain and for the regression task on the Covid-19 dataset from the healthcare domain. The empirical findings corroborate that Apache Spark exhibits superior execution speed in iterative tasks when the available physical memory significantly exceeds the dataset size. Conversely, Hadoop outperforms Spark when dealing with substantial datasets or constrained physical memory resources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.