Abstract

A spelling error commonly occurs during document writing. It probably happens due to the authors’ vocabulary incompetence or they may strike the improper key in the keyboard. The types of errors that mostly appear such as insertion of an extra letter, deletion of one letter, substitution of one letter, or transposition of two adjacent letters. This study aims to identify the common type of spelling error and it uses the list of common misspelling words submitted by Wikipedia contributors.A brief overview of Levenshtein and N-gram distance techniques is provided to describe the technical approaches that support the author to achieve the purpose of this study.Those two techniques are utilised to predict the correct word of misspellings from the English dictionary.This study shows that Levenshtein works well to correct substitution single letter and transposition two sequenced letters, while N-gram operates effectively to fix the word with letter omission.The overall result is then evaluated by recall measurement to see which technique that works well on correcting the misspellings. Since the recall of Levenshtein is higher than N-gram, it is concluded that the frequency of misspelling words which are correctly fixed by Levenshteinoccurs more often.

Highlights

  • During the process of document writing, a typewriter tends to produce a spelling error

  • A study conducted by Madi Murdilan, et al used the techniques implemented in this study, Levenshtein and N-gram, to correct typographical errors made by elementary students in their Indonesian essays [9]

  • RESULT AND DISCUSSION This study provides the first analysis on determining the common type of spelling error written by Wikipedia authors by leveraging the algorithm of Levenshtein distance and N-gram

Read more

Summary

INTRODUCTION

During the process of document writing, a typewriter tends to produce a spelling error. Based on a previous study, more than 80 percent of spelling errors are categorized into one of these following types of error, including [5]: (1) Insertion, this error appears in a word with an addition of one extra letter, such as:. Et al conducted a study to identify the typographical error which occurs in Indonesian language documents by using Levenshtein and N-gram model [7]. A study conducted by Madi Murdilan, et al used the techniques implemented in this study, Levenshtein and N-gram, to correct typographical errors made by elementary students in their Indonesian essays [9]. This study intends to determine the common type of spelling errors by utilizing the spelling correction methods. The list of commonly misspelled English words that is compiled by Wikipedia contributors is chosen as the dataset of this study [15]. The list of misspelled words is converted into a dataset of tokens

METHODS
RESULT
Method
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call