Abstract
Data cleaning is an automated process of detecting, removing and correcting incomplete, incorrect, inaccurate and irrelevant data from a record set. Our system works on simple text (*.txt) files using Extract, Transform and Load (ETL) model. In this paper we present a set of algorithms to correct errors such as alpha- numeric errors, invalid gender, invalid ID pattern and redundant ID error. The text files are used as data storage which stores data in a tabular format and the algorithms are applied on each field value depending on its nature.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.