A Hybrid Approach to Error Detection in a Treebank

Rahul Agarwal,Dipti Misra Sharma,Bharat Ram Ambati

doi:10.33011/lilt.v7i.1303

Rahul Agarwal, Dipti Misra Sharma + Show 1 more

Open Access

https://doi.org/10.33011/lilt.v7i.1303

Copy DOI

Abstract

Treebanks are a linguistic resource: a large database where the morphological, syntactic and lexical information for each sentence has been explicitly marked. The critical requirements of treebanks for various NLP activities (research and application) are well known. This also implies that treebanks need to be as error free as possible. However, manual validation of a treebank is very costly, both in terms of time and money. This paper describes an approach to automatically detect errors in a treebank after a complete manual annotation. Over and above improving an earlier error detection tool (Ambati et al. (2011)) for a Hindi treebank. We also present a user study to show that our system reduces the validation time significantly while detecting 81.49% of the errors at the dependency level.

Highlights

Treebanks have proved to be a crucial resource for NLP research and developing solutions for various NLP related applications
A treebank should be error free considering its role in providing appropriate linguistic knowledge
The PBSM proposed by Ambati et al (2011), extracts some contextual features, trains using gold standard training data that is validated by linguistic experts, creates a model using maximum entropy classication algorithm6 (MAXENT), tests the system on the testing data and obtains the probabilities for all the possible dependency tags

Summary

Introduction

Treebanks have proved to be a crucial resource for NLP research and developing solutions for various NLP related applications. Automatic error detection tools are required to reduce the time of validation. A semi-automatic procedure involves annotating the grammatical information using tools. Output of these tools is manually checked and corrected. Both these procedures may leave errors in the treebank on the rst attempt. We improve over the mechanism proposed by Ambati et al (2011) to detect dependency annotation errors. For more details on the type of errors which we extract from the Hindi dependency treebank, please refer to our previous work (Ambati et al (2011)).

Related Work

Our Approach

Rule-based Correction

Experiment and Results

TABLE Comparison of performance of overall hybrid system at dependency level

User Studies

A few system improvements to reduce the validation time

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Linguistic Issues in Language Technology	Publication Date: Jan 1, 2012
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

A Hybrid Approach to Error Detection in a Treebank

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Linguistic Issues in Language Technology

Lead the way for us

Similar Papers

Chinese Hedge Scope Detection Based on Structure and Semantic Information
Huiwei Zhou ... Junli Xu
-
Huiwei Zhou, et. al.Huiwei Zhou ... Junli Xu
01 Jan 2015
01 Jan 2015

Linguistic features in Turkish word representations
Onur Gungor ... Eray Yildiz
-
Onur Gungor, et. al.Onur Gungor ... Eray Yildiz
01 May 2017
01 May 2017

Early Error Detection and Classification in Data Transfer Scheduling
Mehmet Balman ... Tevfik Kosar
-
Mehmet Balman, et. al.Mehmet Balman ... Tevfik Kosar
01 Mar 2009
01 Mar 2009

From detection/correction to computer aided writing
Damien Genthial ... Jacques Courtin
-
Damien Genthial, et. al.Damien Genthial ... Jacques Courtin
01 Jan 1992
01 Jan 1992

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Approach to Error Detection in a Treebank

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Linguistic Issues in Language Technology