Similarity detection based on document matrix model and edit distance algorithm

Artur Niewiarowski

doi:10.24423/cames.277

Abstract

This paper presents a new algorithm with an objective of analyzing the similarity measure between two text documents. Specifically, the main idea of the implemented method is based on the structure of the so-called “edit distance matrix” (similarity matrix). Elements of this matrix are filled with a formula based on Levenshtein distances between sequences of sentences. The Levenshtein distance algorithm (LDA) is used as a replacement for various implementations of stemming or lemmatization methods. Additionally, the proposed algorithm is fast, precise, and may be implemented for analyzing very large documents (e.g., books, diploma works, newspapers, etc.). Moreover, it seems to be versatile for the most common European languages such as Polish, English, German, French and Russian. The presented tool is intended for all employees and students of the university to detect the level of similarity regarding analyzed documents. Results obtained in the paper were confirmed in the tests shown in the article.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Similarity detection based on document matrix model and edit distance algorithm

Abstract

Talk to us

Similar Papers

More From: Computer Assisted Mechanics and Engineering Sciences

Lead the way for us

Similar Papers

Efficient Parallel Design for Edit distance algorithm in DNA Sequence Alignment
Xu Li ... Zhenzhou Ji
International Journal of Engineering and Manufacturing | VOL. 1
Xu Li, et. al.Xu Li ... Zhenzhou Ji
29 Aug 2011
International Journal of Engineering and Manufacturing | VOL. 1

Spelling Correction Using the Levenshtein Distance and Nazief and Adriani Algorithm for Keyword Search Process Indonesian Qur'an Translation
Muhammad Iskandar Yahya ... Dewi Khairani
-
Muhammad Iskandar Yahya, et. al.Muhammad Iskandar Yahya ... Dewi Khairani
08 Dec 2022
08 Dec 2022

Investigating the Impact of Utilizing the K-Nearest Neighbor and Levenshtein Distance Algorithms for Arabic Sentiment Analysis on Mobile Applications
Ahmed A Al-Shalabi ... Fahd Alqasemi
مجلة جامعة صنعاء للعلوم التطبيقية والتكنولوجيا | VOL. 1
Ahmed A Al-Shalabi, et. al.Ahmed A Al-Shalabi ... Fahd Alqasemi
30 Apr 2023
مجلة جامعة صنعاء للعلوم التطبيقية والتكنولوجيا | VOL. 1

PERANCANGAN SISTEM PENDETEKSI BERITA HOAX MENGGUNAKAN ALGORITMA LEVENSHTEIN DISTANCE BERBASIS PHP
Aprillianda Pasaribu ... Relita Buaton
Jurnal Informatika Kaputama (JIK) | VOL. 4
Aprillianda Pasaribu, et. al.Aprillianda Pasaribu ... Relita Buaton
01 Jan 2020
Jurnal Informatika Kaputama (JIK) | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Similarity detection based on document matrix model and edit distance algorithm

Abstract

Talk to us

Similar Papers

More From: Computer Assisted Mechanics and Engineering Sciences