Unified Framework for Control of Machine Learning Tasks Towards Effective and Efficient Processing of Big Data

Han Liu,Mihaela Cocea,Alexander Gegov

doi:10.1007/978-3-319-53474-9_6

Abstract

Big data can be generally characterised by 5 Vs—Volume, Velocity, Variety, Veracity and Variability. Many studies have been focused on using machine learning as a powerful tool of big data processing . In machine learning context, learning algorithms are typically evaluated in terms of accuracy, efficiency, interpretability and stability. These four dimensions can be strongly related to veracity, volume, variety and variability and are impacted by both the nature of learning algorithms and characteristics of data. This chapter analyses in depth how the quality of computational models can be impacted by data characteristics as well as strategies involved in learning algorithms. This chapter also introduces a unified framework for control of machine learning tasks towards appropriate employment of algorithms and efficient processing of big data. In particular, this framework is designed to achieve effective selection of data pre-processing techniques towards effective selection of relevant attributes, sampling of representative training and test data, and appropriate dealing with missing values and noise. More importantly, this framework allows the employment of suitable machine learning algorithms on the basis of the training data provided from the data pre-processing stage towards building of accurate, efficient and interpretable computational models.

Full Text