Abstract
Big Data has become commonplace in most Internet-based applications, which by delivering services planetary scale numbers of users generate very large data sets. Such data sets are considered as a valuable source of analytics information and knowledge for many purposes and domains. It is claimed each time more that Big Data and machine learning, especially data mining, are the basis for developing advanced analytics platforms for turning data into valuable assets, gaining competitive advantage and make better decisions. At the same time, however, Big Data applications are showing be killer applications for the state of the art machine learning and data mining algorithms. Indeed, traditional data mining frameworks such as WEKA, R, etc. and those from big companies such as IBM SPSS Modeler, SAS Enterprise Miner, Oracle Data Mining, etc. are facing the challenges of 1) coping with mining large data sets within short times and 2) under high rates of data generation. The way envisaged ahead effectively deal with such challenges is move Cloud-based versions of such frameworks and development of new frameworks implemented using Cloud platforms. In either case, data mining and machine learning algorithms are being fully implemented in Cloud platforms under new requirements of Big Data for efficiency and performance. In the group of newly developed frameworks there is Apache Mahout, whose goal is to build an environment for quickly creating scalable performant machine learning applications. In this paper we analyse the performance of some clustering algorithms of Apache Mahout using a Twitter streaming dataset under a Hadoop MapReduce cluster infrastructure according various evaluation criteria.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.