Abstract
The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.
Highlights
Data mining (DM) is the core stage of the knowledge discovery process that aims to extract interesting and potentially useful information from data (Goodfellow et al 2016; Mierswa 2017)
There are many methods that can be applied for model evaluation, such as cross-validation, kfold, holdout with various metrics such as accuracy (ACC), precision, recall, F1, Matthews correlation coefficient (MCC), receiver operating characteristic (ROC), area under the curve (AUC), mean absolute error (MAE), mean squared error (MSE), and root-mean-square error (RMSE)
The Machine Learning (ML) group at National Taiwan University provides support for MPI LibLinear, which is an extension of LibLinear for distributed environments and for Spark LibLinear, which is Spark implementation based on LibLinear and integrated with Hadoop distributed file system (NTU 2018)
Summary
Data mining (DM) is the core stage of the knowledge discovery process that aims to extract interesting and potentially useful information from data (Goodfellow et al 2016; Mierswa 2017). The surge of large Volume of information, especially with the Variety characteristic, to be processed by data mining and ML algorithms demand new transformative parallel and distributed computing solutions capable to scale computation effectively and efficiently (Cano 2018) In this context, this survey presents a comprehensive overview with comparisons as well as trends in development and usage of cutting-edge AI software, libraries and frameworks, which are able to learn and adapt from previous experience using ML and DL techniques to perform more accurate and more effective operations for problem solving (Rouse 2018).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have