Abstract

Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).

Highlights

  • Data is increasing swiftly and has become very difficult to gather and operate the data

  • This study aims to state that data cleansing concept is not included in exploratory data analysis (EDA) concept

  • The firms are highly dependent on data-driven decision making and the information system is very much integrated with the business process management and used for various competitive advantages

Read more

Summary

Introduction

Data is increasing swiftly and has become very difficult to gather and operate the data. Get the information and make a decision from the concluded information, one has to go through a big process. Some of them are free and easy to use such as R programming, Python, SaS etc. These are the languages which are very useful because of their english commands and easy to use syntax [6]. Data scientists get the data from various sources such as internet, organization and they use several kinds of programming languages for data cleansing [14]. There are various kinds of errors present there in the data set and making it dirty data such as null /

Objectives
Methods
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.