Brown Corpus Research Articles

PurposeAccording to the Indian Sign Language Research and Training Centre (ISLRTC), India has approximately 300 certified human interpreters to help people with hearing loss. This paper aims to address the issue of Indian Sign Language (ISL) sentence recognition and translation into semantically equivalent English text in a signer-independent mode.Design/methodology/approachThis study presents an approach that translates ISL sentences into English text using the MobileNetV2 model and Neural Machine Translation (NMT). The authors have created an ISL corpus from the Brown corpus using ISL grammar rules to perform machine translation. The authors’ approach converts ISL videos of the newly created dataset into ISL gloss sequences using the MobileNetV2 model and the recognized ISL gloss sequence is then fed to a machine translation module that generates an English sentence for each ISL sentence.FindingsAs per the experimental results, pretrained MobileNetV2 model was proven the best-suited model for the recognition of ISL sentences and NMT provided better results than Statistical Machine Translation (SMT) to convert ISL text into English text. The automatic and human evaluation of the proposed approach yielded accuracies of 83.3 and 86.1%, respectively.Research limitations/implicationsIt can be seen that the neural machine translation systems produced translations with repetitions of other translated words, strange translations when the total number of words per sentence is increased and one or more unexpected terms that had no relation to the source text on occasion. The most common type of error is the mistranslation of places, numbers and dates. Although this has little effect on the overall structure of the translated sentence, it indicates that the embedding learned for these few words could be improved.Originality/valueSign language recognition and translation is a crucial step toward improving communication between the deaf and the rest of society. Because of the shortage of human interpreters, an alternative approach is desired to help people achieve smooth communication with the Deaf. To motivate research in this field, the authors generated an ISL corpus of 13,720 sentences and a video dataset of 47,880 ISL videos. As there is no public dataset available for ISl videos incorporating signs released by ISLRTC, the authors created a new video dataset and ISL corpus.

Read full abstract

It has been argued that most of corpus linguistics involves one of four fundamental methods: frequency lists, dispersion, collocation, and concordancing. All these presuppose (if only implicitly) the definition of a unit: the element whose frequency in a corpus, in corpus parts, or around a search word are counted (or quantified in other ways). Usually and with most corpus-processing tools, a unit is an orthographic word. However, it is obvious that this is a simplifying assumption borne out of convenience: clearly, it seems more intuitive to consider because of or in spite of as one unit each rather than two or three. Some work in computational linguistics has developed multi-word unit (MWU) identification algorithms, which typically involve co-occurrence token frequencies and association measures (AMs), but these have not become widespread in corpus-linguistic practice despite the fact that recognizing MWUs like the above will have a profound impact on just about all corpus statistics that involve (simplistic notions of) words/units. In this programmatic proof-of-concept paper, I introduce and exemplify an algorithm to identify MWUs that goes beyond frequency and bidirectional association by also involving several well-known but underutilized dimensions of corpus-linguistic information: frequency: how often does a potential unit (like in_spite_of) occur?; dispersion: how widespread is the use of a potential unit?; association: how strongly attracted are the parts of a potential unit?; entropy: how variable is each slot in a potential unit? The proposed algorithm can use all these dimensions and weight them differently. I will (i) present the algorithm in detail, (ii) exemplify its application to the Brown corpus, (iii) discuss its results on the basis of several kinds of MWUs it returns, and (iv) discuss next analytical steps.

Read full abstract

Brown Corpus Research Articles

Related Topics

Articles published on Brown Corpus

A Method for Measuring Word Sequence Complexity of Text

TF-IDF combined rank factor Naive Bayesian algorithm for intelligent language classification recommendation systems

Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward

ВИКОРИСТАННЯ ЛІНГВІСТИЧНИХ КОРПУСІВ У ВИВЧЕННІ СЛОВОБУДОВИ ДІАЛЕКТІВ АМЕРИКАНСЬКОГО ВАРІАНТУ СУЧАСНОЇ АНГЛІЙСЬКОЇ МОВИ

An approach based on deep learning for Indian sign language translation

A Machine Translation System from Indian Sign Language to English Text

Conformal Prediction for Text Infilling and Part-of-Speech Prediction

An auxiliary Part‐of‐Speech tagger for blog and microblog cyber‐slang

Multi-word units (and tokenization more generally): a multi-dimensional and largely information-theoretic approach

Using Human Intelligence to Test the Impact of Popular Preprocessing Steps and Feature Extraction in the Analysis of Human Language

Semantic role labeling for knowledge graph extraction from text

Traces of American English in Pakistani English: A Comparative Multi-dimensional Study of Press Editorials

Part of Speech Tagging Using Hidden Markov Models

The interaction of various temporal devices in the use of past followed by temporal nouns

Presuppositions and Assertions: A Case Study on Acquisition of Most Common Presupposition Triggers in Early Childhood

English Raising Predicates and (Non-)Finite Clauses

Functional Text Dimensions for the annotation of web corpora

Zu theoretischen und praktischen Aspekten des Fachübersetzens

CRNN: A Joint Neural Network for Redundancy Detection

Prepositions from the Perspective of Cognitive Study

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Brown Corpus Research Articles

Related Topics

Articles published on Brown Corpus

A Method for Measuring Word Sequence Complexity of Text

TF-IDF combined rank factor Naive Bayesian algorithm for intelligent language classification recommendation systems

Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward

ВИКОРИСТАННЯ ЛІНГВІСТИЧНИХ КОРПУСІВ У ВИВЧЕННІ СЛОВОБУДОВИ ДІАЛЕКТІВ АМЕРИКАНСЬКОГО ВАРІАНТУ СУЧАСНОЇ АНГЛІЙСЬКОЇ МОВИ

An approach based on deep learning for Indian sign language translation

A Machine Translation System from Indian Sign Language to English Text

Conformal Prediction for Text Infilling and Part-of-Speech Prediction

An auxiliary Part‐of‐Speech tagger for blog and microblog cyber‐slang

Multi-word units (and tokenization more generally): a multi-dimensional and largely information-theoretic approach

Using Human Intelligence to Test the Impact of Popular Preprocessing Steps and Feature Extraction in the Analysis of Human Language

Semantic role labeling for knowledge graph extraction from text

Traces of American English in Pakistani English: A Comparative Multi-dimensional Study of Press Editorials

Part of Speech Tagging Using Hidden Markov Models

The interaction of various temporal devices in the use of past followed by temporal nouns

Presuppositions and Assertions: A Case Study on Acquisition of Most Common Presupposition Triggers in Early Childhood

English Raising Predicates and (Non-)Finite Clauses

Functional Text Dimensions for the annotation of web corpora

Zu theoretischen und praktischen Aspekten des Fachübersetzens

CRNN: A Joint Neural Network for Redundancy Detection

Prepositions from the Perspective of Cognitive Study