Abstract

-Data Pre-processing transforms the data into a format that is more easily and effectively processed in data mining, machine learning, and other data science tasks. The Data Pre-processing techniques are generally used at the earlier stages of the machine learning and AI development pipeline to ensure accurate results. Data must first go through several pre-processing steps before the machine learning model can use it. We intend to build a GUI that allows users to input the data with inconsistencies and various options are provided to pre- process. After the data is processed, we provide the users an option to analyze the data which gives them clarity on the further aspects of preprocessing. Pre-processing consists of three main phases i.e. Data Cleaning, Data Transformation, and Data Reduction. Data cleaning involves correcting or removing inaccurate, corrupted, improperly formatted, duplicate, or incomplete data from a dataset. Data transformation is a technique used to convert data into a suitable format that eases data mining and retrieves strategic information. Data reduction is the process of shrinking the size of the original data so that it can be represented in a much smaller volume. Preprocessing requires a lot of work, what if we could automate it? With just one click, this web application is capable of transforming inconsistent data into consistent data that could be used further. The GUI is built using the Streamlit library in python and in the backend, we intend to use the AutoClean library along with many other libraries based on different formats of data.. Key Words: Data Cleaning, AutoClean, Data Transformation, Data Reduction, Streamlit.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.