Partial Parsing Research Articles

BackgroundInformation regarding bacteria biotopes is important for several research areas including health sciences, microbiology, and food processing and preservation. One of the challenges for scientists in these domains is the huge amount of information buried in the text of electronic resources. Developing methods to automatically extract bacteria habitat relations from the text of these electronic resources is crucial for facilitating research in these areas.MethodsWe introduce a linguistically motivated rule-based approach for recognizing and normalizing names of bacteria habitats in biomedical text by using an ontology. Our approach is based on the shallow syntactic analysis of the text that include sentence segmentation, part-of-speech (POS) tagging, partial parsing, and lemmatization. In addition, we propose two methods for identifying bacteria habitat localization relations. The underlying assumption for the first method is that discourse changes with a new paragraph. Therefore, it operates on a paragraph-basis. The second method performs a more fine-grained analysis of the text and operates on a sentence-basis. We also develop a novel anaphora resolution method for bacteria coreferences and incorporate it with the sentence-based relation extraction approach.ResultsWe participated in the Bacteria Biotope (BB) Task of the BioNLP Shared Task 2013. Our system (Boun) achieved the second best performance with 68% Slot Error Rate (SER) in Sub-task 1 (Entity Detection and Categorization), and ranked third with an F-score of 27% in Sub-task 2 (Localization Event Extraction). This paper reports the system that is implemented for the shared task, including the novel methods developed and the improvements obtained after the official evaluation. The extensions include the expansion of the OntoBiotope ontology using the training set for Sub-task 1, and the novel sentence-based relation extraction method incorporated with anaphora resolution for Sub-task 2. These extensions resulted in promising results for Sub-task 1 with a SER of 68%, and state-of-the-art performance for Sub-task 2 with an F-score of 53%.ConclusionsOur results show that a linguistically-oriented approach based on the shallow syntactic analysis of the text is as effective as machine learning approaches for the detection and ontology-based normalization of habitat entities. Furthermore, the newly developed sentence-based relation extraction system with the anaphora resolution module significantly outperforms the paragraph-based one, as well as the other systems that participated in the BB Shared Task 2013.

ion is introduced. Since MBLP does not abstract over the training data, it is called a lazy learning approach. Rule induction, in contrast, learns rules and does not go back to the actual training data during classification. ∗ A shorter version of this review will be published in German in the journal Linguistische Berichte. Computational Linguistics Volume 32, Number 4 The book consists of 7 chapters. Chapter 1 situates memory-based language processing firmly in the domain of empirical approaches to NLP. Empirical approaches became attractive in the early 1990s, replacing knowledge-based approaches to a high degree. Daelemans and van den Bosch argue that in the range of empirical approaches, memory-based learning offers the advantage over statistical approaches that it does not abstract over low-frequency events. Such low-frequency events are necessary in processing natural language problems because they often describe exceptions or subregularities. The chapter also introduces the major concepts of MBLP and provides an intuitive example from linguistics: PP attachment. Chapter 2 locates central concepts of MBLP in neighboring areas of research: In linguistics, the idea of processing by analogy to previous experience is a well-known concept. Psycholinguistics often uses exemplar-based approaches or, more recently, hybrid approaches that combine rules with exceptions. Applications of memory-based principles can be found in explanation-based machine translation (Nagao 1984) and data-oriented parsing (Bod 1998). Chapter 3 gives a simultaneous introduction to memory-based learning and TiMBL, the Tilburg implementation of the method. This strategy of combining theory and practice gives the reader an impression of the importance of selecting optimal parameter settings for different problems. The application of TiMBL is demonstrated on the example of plural formation in German. The chapter ends with the introduction of evaluation methodology and TiMBL’s built-in evaluation functions. Chapter 4 describes the application of TiMBL to two more complex linguistic examples: grapheme to phoneme conversion and morphological analysis. In order to find optimal solutions for these problems, two algorithms that deviate from the standard memory-based learning algorithm are introduced: IGTREE and TRIBL. IGTREE is a decision tree approximation, which bases the comparison of an example to others on a small number of feature comparisons. TRIBL is a hybrid model between the standard memory-based learning algorithm, IB1, and IGTREE. Both modifications reduce memory requirements and processing time during classification, but they may also affect classification accuracy. Unfortunately, the presentation of the first example suffers from unreadable phonetic transcriptions throughout the chapter. Whereas Chapter 4 analyzes linguistic problems, which are easily described in terms of classification, chapter 5 approaches a problem of sequence learning: partial parsing. For this task, phrase and clause boundaries must be found. In order to apply classification methods to sequence learning, the problemmust be redefined as assigning tags to words or word combinations, so-called IOB tagging (Ramshaw and Marcus 1995). This tagging provides information as to whether a word constitutes a boundary or not. One advantage of using MBLP for such problems lies in the fact that different types of information, including long-distance information, can be included without modification of the original algorithm. In Chapter 6, Daelemans and van den Bosch investigate the difference between lazy and eager learning. As noted earlier, TiMBL is a typical example of lazy learning since it does not abstract from the training data. RIPPER (Cohen 1995), the other classifier used in this chapter, is a typical eager learning approach: It is a rule-induction algorithm, which displays the opposite behavior to TiMBL: a complex learning strategy and simple, efficient classification. The results presented in this chapter show that deleting examples from the training data is harmful for classification, supporting the hypothesis that lazy learning has a fitting bias for natural language problems. However, this seems to be a little too straightforward. Here, one would expect a reference to the findings of Daelemans and Hoste (2002), which show that parameter and feature

Partial Parsing Research Articles

Related Topics

Articles published on Partial Parsing

In-Order Transition-based Constituent Parsing

Detection and categorization of bacteria habitats using shallow linguistic analysis.

A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data

Parallel parsing of operator precedence grammars

Using modified incremental chart parsing to ascribe intentions to animated geometric figures

Event extraction for systems biology by text mining the literature

Bigrams of Syntactic Labels for Authorship Discrimination of Short Texts

Memory-Based Language Processing Walter Daelemans and Antal van den Bosch (University of Antwerp and Tilburg University), Cambridge: Cambridge University Press, 2005, vii+189 pp; hardbound, ISBN 0-521-80890-1, $75.00

Filtering-Ranking Perceptron Learning for Partial Parsing

Extracting Partial Parsing Rules from Tree-Annotated Corpus: Toward Deterministic Global Parsing

Optimality parsing and local cost functions in Discontinuous Grammar

Effects of merely local syntactic coherence on sentence processing

Improving partial parsing based on error-pattern analysis for a Korean grammar-checker

규칙에 기반한 한국어 부분 구문분석기의 구현

The Disambiguation of Nominalizations

Robustness beyond shallowness: incremental deep parsing

An Algorithm for Anaphora Resolution in Spanish Texts

Tabulation for Multi-Purpose Partial Parsing

A lightweight dependency analyzer for partial parsing

10.1162/153244302320884533

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Partial Parsing Research Articles

Related Topics

Articles published on Partial Parsing

In-Order Transition-based Constituent Parsing

Detection and categorization of bacteria habitats using shallow linguistic analysis.

A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data

Parallel parsing of operator precedence grammars

Using modified incremental chart parsing to ascribe intentions to animated geometric figures

Event extraction for systems biology by text mining the literature

Bigrams of Syntactic Labels for Authorship Discrimination of Short Texts

Memory-Based Language Processing Walter Daelemans and Antal van den Bosch (University of Antwerp and Tilburg University), Cambridge: Cambridge University Press, 2005, vii+189 pp; hardbound, ISBN 0-521-80890-1, $75.00

Filtering-Ranking Perceptron Learning for Partial Parsing

Extracting Partial Parsing Rules from Tree-Annotated Corpus: Toward Deterministic Global Parsing

Optimality parsing and local cost functions in Discontinuous Grammar

Effects of merely local syntactic coherence on sentence processing

Improving partial parsing based on error-pattern analysis for a Korean grammar-checker

규칙에 기반한 한국어 부분 구문분석기의 구현

The Disambiguation of Nominalizations

Robustness beyond shallowness: incremental deep parsing

An Algorithm for Anaphora Resolution in Spanish Texts

Tabulation for Multi-Purpose Partial Parsing

A lightweight dependency analyzer for partial parsing

10.1162/153244302320884533