Abstract

Nowadays, many data from millions of websites, applications, social media resources, surveys, video surveillance platforms, and many other sources are obtained in a very large amount. By processing large datasets that occur every day, useful information can be derived. Distributed data processing platforms are needed to handle large amounts of data. For big data processing and analytics platforms such as Hadoop and Spark, there are machine learning libraries that operates distributed and exploits the advantages of distributed computing. For example; The Mahout library uses the Hadoop platform, while the Spark-MLLib library uses the Spark platform. However, for these platforms, it seems that there is no implementation for the algorithms included in the data mining steps, or there is only the implementation for some of the steps’ algorithms. Within the scope of this research, algorithms in different data mining steps on a large data platform will be implemented and a performance evaluation will be performed. In the context of this research, as a case study, the Sparkling Water platform was chosen as a major data processing platform. The banking data set was used for the tests of the implemented data mining algorithms. A software layer containing all data mining steps was developed on the Sparkling Water platform and performance evaluation was conducted. As a result of the evaluation, it has been observed that performance enhancement which comes with distributed data processing has been successful.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.