Prediction with partitioning: Big data analytics using regression techniques

K Saritha,Sajimon Abraham

doi:10.1109/netact.2017.8076768

Abstract

The cumulative growth of data from various sources has led to the era of big data. Big Data analytics give rise opportunities in designing of competitive offer packages for customers to provide reliable services, but analysis must be accurate and timely for successful decision making. For testing and analyzing Big Data, various statistical methods are developed. Traditional statistical analysis focuses on sampling for generating a predictive mode. To overcome this limitation, Big Data is partition into sub data sets and statistical analysis is employed on each subsets. As the structure of data sets are to be studied initially we have to go through various steps in statistical modeling up to Exploratory Data Analysis (EDA). Dependent variable and independent variables are identified and suitable parametric modeling is suggested. Regression techniques are used to describe the relation between dependent and independent variables. Here we focused different linear regression techniques. The performance are evaluated through simulation methods in the experimental data sets from UCI machine learning repository and its seen that multivariate linear regression shows better performance in parametric modeling.

Full Text