Abstract

Heterogeneity is one of major features of big data and heterogeneous data result in problems in data integration and Big Data analytics. This paper introduces data processing methods for heterogeneous data and Big Data analytics, Big Data tools, some traditional data mining (DM) and machine learning (ML) methods. Deep learning and its potential in Big Data analytics are analysed. The benefits of the confluences among Big Data analytics, deep learning, high performance computing (HPC), and heterogeneous computing are presented. Challenges of dealing with heterogeneous data and Big Data analytics are also discussed.

Highlights

  • Heterogeneous data are any data with high variability of data types and formats

  • Heterogeneous data are often generated from Internet of Things (IoT)

  • This paper focuses on four aspects: 1) introduces data processing methods including data cleaning, data integration, and dimension reduction and data normalization for heterogeneous data and Big Data analytics; 2) presents big data concepts, Big Data analytics, and Big Data tools; 3) compares traditional data mining (DM)/machine learning (ML) methods with deep learning, especially their feasibility in Big Data analytics; 4) discusses the potential of the confluences among Big Data analytics, deep learning, high performance computing (HPC), and heterogeneous computing

Read more

Summary

Introduction

Heterogeneous data are any data with high variability of data types and formats. They are possibly ambiguous and low quality due to missing values, high data redundancy, and untruthfulness. Data generated from IoT often has the following four features [1] Because of the variety of data acquisition devices, the acquired data are different in types with heterogeneity. Terminological heterogeneity stands for variations in names when referring to the same entities from different data sources. Too much data can lead to high cognitive and data processing costs This layer converts individual attributes into information in terms of ‘what-when-where’. For an appropriate interpretation of heterogeneous big data, detailed metadata are required. Heterogeneous, incomplete, uncertain, sparse, and multi-source data are pre-processed by data fusion techniques. This paper focuses on four aspects: 1) introduces data processing methods including data cleaning, data integration, and dimension reduction and data normalization for heterogeneous data and Big Data analytics; 2) presents big data concepts, Big Data analytics, and Big Data tools; 3) compares traditional DM/ML methods with deep learning, especially their feasibility in Big Data analytics; 4) discusses the potential of the confluences among Big Data analytics, deep learning, HPC, and heterogeneous computing

Data Cleaning
Data Integration
Dimension Reduction and Data Normalization
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call