Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation

Nahid Hossain,Mohammad Nurul Huda,Salekul Islam

doi:10.1109/access.2021.3119627

Abstract

A spell and grammar checker is profoundly essential for diverse publications especially for Bangla language in particular as it is spoken by millions of native speakers around the world. Considering the lack of research efforts, we demonstrate the development of a comprehensive Bangla spell and grammar checker with necessary resources. At first, a full-fledged and generalised Bangla monolingual corpus comprising over 100 million words has been built by scraping reputed, diversified online sources and then an extensive Bangla lexicon consisting of over 1 million unique words has been extracted from that corpus. Based on these corpus and lexicon, we have developed a combined spell and grammar checker application that simultaneously detects distinct spelling and grammatical mistakes and provides appropriate suggestions for both as well. The spell checker uses the Double Metaphone algorithm and Edit distance based on the distributed lexicons and numerical suffix dataset to detect all types of Bangla spelling mistakes with an accuracy rate of 97.21% individually. The grammar checker detects errors based on language model probability i.e. combination of bigram and trigram, and generates suggestions based on the Cosine similarity measure with the accuracy rate of 94.29% individually. The datasets and codes used in this work are freely available at https://git.io/JzJ4w .

Highlights

This section mentions the related works of three interconnected but distinct segments of our proposed system: corpus and lexicon, spell checker, and grammar checker.A
By studying the available Bangla corpus, lexicon, spell, and grammar checker, we have identified several limitations in the current approaches, including scarcity of balanced and extensive corpus, substantial lexicon, and efficient spell and grammar checker
We have developed a combined solution for spell and grammar checkers, the details of these two are presented in different sections for simplicity and better understanding

Summary

Introduction

This section mentions the related works of three interconnected but distinct segments of our proposed system: corpus and lexicon, spell checker, and grammar checker. A. CORPUS AND LEXICON A corpus is a collection of written texts, especially the entire works of a particular author or writing body on a particular subject. A lexicon is a vocabulary, a collection of words, or a complete set of meaningful units in a language. Central Institute of Indian Languages (CIIL) [5] first introduced a Bengali corpus along with a corpus of other nine Indian languages in 2001.

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A statistical and rule-based spelling and grammar checker for Indonesian text
Asanilta Fahda ... Ayu Purwarianti
-
Asanilta Fahda, et. al.Asanilta Fahda ... Ayu Purwarianti
01 Nov 2017
01 Nov 2017

A Rule-Based Grammar and Spell Checking
Amit Nayak ... Yash Gondaliya
SAMRIDDHI : A Journal of Physical Sciences, Engineering and Technology | VOL. 14
Amit Nayak, et. al.Amit Nayak ... Yash Gondaliya
25 Mar 2022
SAMRIDDHI : A Journal of Physical Sciences, Engineering and Technology | VOL. 14

Validating the TEMAA LE evaluation methodology: a case study on Danish spelling checkers
Patrizia Paggio ... Nancy L Underwood
Natural Language Engineering | VOL. 4
Patrizia Paggio, et. al.Patrizia Paggio ... Nancy L Underwood
01 Sep 1998
Natural Language Engineering | VOL. 4

Non-word error detection in current South African spellcheckers
Dj Prinsloo ... Gilles-Maurice De Schryver
Southern African Linguistics and Applied Language Studies | VOL. 21
Dj Prinsloo, et. al.Dj Prinsloo ... Gilles-Maurice De Schryver
01 Nov 2003
Southern African Linguistics and Applied Language Studies | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Development of Bangla Spell and Grammar Checkers: Resource Creation and Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access