Levenshtein Distance Algorithm Research Articles

BackgroundThird-generation sequencing offers some advantages over next-generation sequencing predecessors, but with the caveat of harboring a much higher error rate. Clustering-related sequences is an essential task in modern biology. To accurately cluster sequences rich in errors, error type and frequency need to be accounted for. Levenshtein distance is a well-established mathematical algorithm for measuring the edit distance between words and can specifically weight insertions, deletions and substitutions. However, there are drawbacks to using Levenshtein distance in a biological context and hence has rarely been used for this purpose. We present novel modifications to the Levenshtein distance algorithm to optimize it for clustering error-rich biological sequencing data.ResultsWe successfully introduced a bidirectional frameshift allowance with end-user determined accommodation caps combined with weighted error discrimination. Furthermore, our modifications dramatically improved the computational speed of Levenstein distance. For simulated ONT MinION and PacBio Sequel datasets, the average clustering sensitivity for 3GOLD was 41.45% (S.D. 10.39) higher than Sequence-Levenstein distance, 52.14% (S.D. 9.43) higher than Levenshtein distance, 55.93% (S.D. 8.67) higher than Starcode, 42.68% (S.D. 8.09) higher than CD-HIT-EST and 61.49% (S.D. 7.81) higher than DNACLUST. For biological ONT MinION data, 3GOLD clustering sensitivity was 27.99% higher than Sequence-Levenstein distance, 52.76% higher than Levenshtein distance, 56.39% higher than Starcode, 48% higher than CD-HIT-EST and 70.4% higher than DNACLUST.ConclusionOur modifications to Levenshtein distance have improved its speed and accuracy compared to the classic Levenshtein distance, Sequence-Levenshtein distance and other commonly used clustering approaches on simulated and biological third-generation sequenced datasets. Our clustering approach is appropriate for datasets of unknown cluster centroids, such as those generated with unique molecular identifiers as well as known centroids such as barcoded datasets. A strength of our approach is high accuracy in resolving small clusters and mitigating the number of singletons.

Read full abstract

Social media is today a comprehensive source of data that can serve as a guide to professionals in issues related to public health. The purpose of this paper is to investigate the content of topical fluoride-related Twitter posts made over a 3-year period in order to improve our understanding of Twitter users' perceptions and treatment experiences. A continuous cross-sectional sample of Tweets on the subject of 'approaches to the topical fluoride treatment of tooth decay' was collected from the Twitter social networking platform between 1 January 2017 and 1 January 2020 using a software application developed for this research that makes use of the Twitter advanced search API. The words and phrases used for the identification of related Tweets were determined through a screening of the topical fluoride keywords of previous studies, and a search was conducted in the English language. To better arrange the collected Tweets and to make the data more meaningful, firstly one of the natural language process techniques - Tokenization - was applied, after which the Tweets were converted into a set of meaningful words and regular expressions. The Tweets were then compared with each other, word-by-word, with the help of a word-based Levenshtein distance algorithm, after which two experts in the computational social science domain labelled each Tweet. A total of 132,358 Tweeter posts referencing topical fluoride applications were collected, of which 110,847 were eliminated through the use of a word-based Levenshtein distance algorithm, and the remaining corpus of 21,511 posts was analysed and evaluated for specific content. Within the garnered data, 48.5% (n = 10,428) of the Tweeter posts concerned topical fluoride treatments, and 7% (n = 1,507) reported experiences with topical fluoride treatment. Negative Tweeter posts about topical fluoride treatment (5,679, 26.4%) vastly outnumbered those that were positive (3,897, 18.1%). The current study achieved its main objectives of analysing topical fluoride application-related posts made on social media. From the garnered Twitter data, it can be understood that Twitter users regularly share their concerns and negative sentiments about the side effects of topical fluoride treatments on the platform. Future explorations of social media may aid public health and dental professionals in the development of strategies to educate the public and to raise awareness of the importance of topical fluoride applications.

Read full abstract

Levenshtein Distance Algorithm Research Articles

Related Topics

Articles published on Levenshtein Distance Algorithm

3GOLD: optimized Levenshtein distance for clustering third-generation sequencing data

Mitigation Impact of Energy and Time Delay for Computation Offloading in an Industrial IoT Environment Using Levenshtein Distance Algorithm

Topical Fluoride Applications Related Posts Analysis on Twitter Using Natural Language Processing.

STEMMING BAHASA JAWA MENGGUNAKAN DAMERAU LEVENSHTEIN DISTANCE (DLD)

Enhancing text classification performance by preprocessing misspelled words in Indonesian language

Designing a word recommendation application using the Levenshtein Distance algorithm

Integrated modelling of automobile maintenance expert system based on knowledge graph

Dig That Lick (DTL): Analyzing Large-Scale Data for Melodic Patterns in Jazz Performances

Analyzing and Experimenting Open Source OCR Engines in RPA with Levenshtein Distance Algorithm

Hybrid Spelling Correction and Query Expansion for Relevance Document Searching

Spelling Checker using Algorithm Damerau Levenshtein Distance and Cosine Similarity

Analisis Sentimen Kinerja KPU Pemilu 2019 Menggunakan Algoritma K-Means Dengan Algoritma Confix Stripping Stemmer

Sistem Pendeteksi Dini Plagiarisme Menggunakan Algoritma Levenshtein Distance

Autocorrect on Drugs e-Dictionary Search Module Using Levenshtein Distance Algorithm

PERANCANGAN SISTEM PENDETEKSI BERITA HOAX MENGGUNAKAN ALGORITMA LEVENSHTEIN DISTANCE BERBASIS PHP

KWA: A New Method of Calculation and Representation Accuracy for Speech Keyword Spotting in String Results

Similarity Based Information Retrieval Using Levenshtein Distance Algorithm

Similarity detection based on document matrix model and edit distance algorithm

Query Suggestion on Drugs e-Dictionary Using the Levenshtein Distance Algorithm

Fuzzy string implementation matching on android-based encyclopedia and anatomy quiz

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Levenshtein Distance Algorithm Research Articles

Related Topics

Articles published on Levenshtein Distance Algorithm

3GOLD: optimized Levenshtein distance for clustering third-generation sequencing data

Mitigation Impact of Energy and Time Delay for Computation Offloading in an Industrial IoT Environment Using Levenshtein Distance Algorithm

Topical Fluoride Applications Related Posts Analysis on Twitter Using Natural Language Processing.

STEMMING BAHASA JAWA MENGGUNAKAN DAMERAU LEVENSHTEIN DISTANCE (DLD)

Enhancing text classification performance by preprocessing misspelled words in Indonesian language

Designing a word recommendation application using the Levenshtein Distance algorithm

Integrated modelling of automobile maintenance expert system based on knowledge graph

Dig That Lick (DTL): Analyzing Large-Scale Data for Melodic Patterns in Jazz Performances

Analyzing and Experimenting Open Source OCR Engines in RPA with Levenshtein Distance Algorithm

Hybrid Spelling Correction and Query Expansion for Relevance Document Searching

Spelling Checker using Algorithm Damerau Levenshtein Distance and Cosine Similarity

Analisis Sentimen Kinerja KPU Pemilu 2019 Menggunakan Algoritma K-Means Dengan Algoritma Confix Stripping Stemmer

Sistem Pendeteksi Dini Plagiarisme Menggunakan Algoritma Levenshtein Distance

Autocorrect on Drugs e-Dictionary Search Module Using Levenshtein Distance Algorithm

PERANCANGAN SISTEM PENDETEKSI BERITA HOAX MENGGUNAKAN ALGORITMA LEVENSHTEIN DISTANCE BERBASIS PHP

KWA: A New Method of Calculation and Representation Accuracy for Speech Keyword Spotting in String Results

Similarity Based Information Retrieval Using Levenshtein Distance Algorithm

Similarity detection based on document matrix model and edit distance algorithm

Query Suggestion on Drugs e-Dictionary Using the Levenshtein Distance Algorithm

Fuzzy string implementation matching on android-based encyclopedia and anatomy quiz