Computational load reduction of fuzzy duplicate detection in large amounts of information

E Sharapova

doi:10.1088/1757-899x/734/1/012119

Computational load reduction of fuzzy duplicate detection in large amounts of information

E Sharapova

Open Access

https://doi.org/10.1088/1757-899x/734/1/012119

Copy DOI

Export

Save

Cite

Journal: IOP Conference Series: Materials Science and Engineering	Publication Date: Jan 1, 2020
License type: cc-by

Affiliation: Vladimir State University

#Context Analysis Methods #Low Computational Costs #Fuzzy Duplicates #Detection Of Duplicates #Computational Costs #Low Costs #Small Threshold #Large Costs #Computational Load Reduction #Match Threshold

Abstract
Full-Text
Similar Papers

Abstract

Listen

The paper deals with the detection of fuzzy duplicates of documents in large amounts of information with low computational costs. The existing methods give either low search completeness at low computational costs, or acceptable completeness at very large computational costs. It is proposed to use combined method of detecting fuzzy duplicates. At the beginning of the whole set of documents with the help of signatures similar texts are searched and then, using context analysis methods, a detailed comparison of the texts found in this way is carried out. The method first performs an approximate search for similar documents using description words signature with a small match threshold. A detailed search for matches in previously found documents is performed using the shingles method.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: IOP Conference Series: Materials Science and Engineering

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.

R Discovery Prime

Computational load reduction of fuzzy duplicate detection in large amounts of information

Abstract

Published Version

Talk to us

Similar Papers

More From: IOP Conference Series: Materials Science and Engineering

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Computational load reduction of fuzzy duplicate detection in large amounts of information

Abstract

Published Version

Talk to us

Similar Papers

More From: IOP Conference Series: Materials Science and Engineering