Abstract

The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.

Highlights

  • Data mining (DM) is the core stage of the knowledge discovery process that aims to extract interesting and potentially useful information from data (Goodfellow et al 2016; Mierswa 2017)

  • There are many methods that can be applied for model evaluation, such as cross-validation, kfold, holdout with various metrics such as accuracy (ACC), precision, recall, F1, Matthews correlation coefficient (MCC), receiver operating characteristic (ROC), area under the curve (AUC), mean absolute error (MAE), mean squared error (MSE), and root-mean-square error (RMSE)

  • The Machine Learning (ML) group at National Taiwan University provides support for MPI LibLinear, which is an extension of LibLinear for distributed environments and for Spark LibLinear, which is Spark implementation based on LibLinear and integrated with Hadoop distributed file system (NTU 2018)

Read more

Summary

Introduction

Data mining (DM) is the core stage of the knowledge discovery process that aims to extract interesting and potentially useful information from data (Goodfellow et al 2016; Mierswa 2017). The surge of large Volume of information, especially with the Variety characteristic, to be processed by data mining and ML algorithms demand new transformative parallel and distributed computing solutions capable to scale computation effectively and efficiently (Cano 2018) In this context, this survey presents a comprehensive overview with comparisons as well as trends in development and usage of cutting-edge AI software, libraries and frameworks, which are able to learn and adapt from previous experience using ML and DL techniques to perform more accurate and more effective operations for problem solving (Rouse 2018).

Machine Learning process
Neural Networks and Deep Learning
Accelerated computing
Machine Learning frameworks and libraries
RapidMiner
Scikit-Learn
LibSVM
LibLinear
Vowpal Wabbit
XGBoost
Interactive data analytic and visualization tools
4.1.10 Other data analytic frameworks and libraries
Deep Learning frameworks and libraries
TensorFlow
Chollet
Microsoft CNTK
Caffe2
PyTorch
Chainer
4.2.10 Theano
4.2.11 Performance-wise preliminary
4.2.12 Deep Learning wrapper libraries
Machine Learning and Deep Learning frameworks and libraries with MapReduce
Deeplearning4j
Apache Spark MLlib and Spark ML
Other frameworks and libraries with MapReduce
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call