Duplicate File Detection and Elimination

Kanupriya Joshi,Mrs Mamta

doi:10.32628/cseit19544

Abstract

The problem of detecting and eliminating duplicated file is one of the major problems in the broad area of data cleaning and data quality in system. Many times, the same logical real world entity may have multiple representations in the data warehouse. Duplicate elimination is hard because it is caused by several types of errors like typographical errors, and different representations of the same logical value. The main objective of this research work is to detect exact and inexact duplicates by using duplicate detection and elimination rules. This approach is used to improve the efficiency of the data. The importance of data accuracy and quality has increased with the explosion of data size. In the duplicate elimination step, only one copy of exact duplicated records or file are retained and eliminated other duplicate records or file. The elimination process is very important to produce a cleaned data. Before the elimination process, the similarity threshold values are calculated for all the records which are available in the data set. The similarity threshold values are important for the elimination process.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Duplicate File Detection and Elimination

Abstract

Published Version

Talk to us

Similar Papers

More From: International Journal of Scientific Research in Computer Science, Engineering and Information Technology

Lead the way for us

Similar Papers

Duplicate Detection of Records in Queries using Clustering
M Anitha ... T.P Shekhar
International Journal of Research in Computer Science | VOL. 2
M Anitha, et. al.M Anitha ... T.P Shekhar
29 Feb 2012
International Journal of Research in Computer Science | VOL. 2

Handling Duplicate Data in Data Warehouse for Data Mining
J Jebamalar Tamilselvi ... C Brilly Gifta
International Journal of Computer Applications | VOL. 15
J Jebamalar Tamilselvi, et. al.J Jebamalar Tamilselvi ... C Brilly Gifta
28 Feb 2011
International Journal of Computer Applications | VOL. 15

A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension
Otmane Azeroual ... Meena Jha
Multimodal Technologies and Interaction | VOL. 6
Otmane Azeroual, et. al.Otmane Azeroual ... Meena Jha
11 Apr 2022
Multimodal Technologies and Interaction | VOL. 6

Data Cleansing Techniques for Large Enterprise Datasets
K Hima Prasad ... Sachindra Joshi
-
K Hima Prasad, et. al.K Hima Prasad ... Sachindra Joshi
01 Mar 2011
01 Mar 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Duplicate File Detection and Elimination

Abstract

Published Version

Talk to us

Similar Papers

More From: International Journal of Scientific Research in Computer Science, Engineering and Information Technology