Abstract

This paper describes the design and evaluation of a system for the automatic detection and resolution of shell nouns in German. Shell nouns are general nouns, such as fact, question, or problem, whose full interpretation relies on a content phrase located elsewhere in a text, which these nouns simultaneously serve to characterize and encapsulate. To accomplish this, the system uses a series of lexico-syntactic patterns in order to extract shell noun candidates and their content in parallel. Each pattern has its own classifier, which makes the final decision as to whether or not a link is to be established and the shell noun resolved. Overall, about 26.2% of the annotated shell noun instances were correctly identified by the system, and of these cases, about 72.5% are assigned the correct content phrase. Though it remains difficult to identify shell noun instances reliably (recall is accordingly low in this regard), this system usually assigns the right content to correctly classified cases. cases.

Highlights

  • The term shell noun refers to the way in which particular general nouns are used to characterize and encapsulate a complex chunk of information for later reference, which might ordinarily be realized by a verb phrase or a sentence (Schmid, 2000)

  • In order to determine which of the extracted candidate pairs constitute actual shell noun instances, I use a series of Naive Bayes classifiers,2 which make the final decisions as to whether or not a given noun is to be regarded as a shell noun and resolved to some content phrase

  • In order to help the system recognize nominalized content phrases, which are especially important for German-language data, I include a number of surface-level features, such as whether or not a lemma ends with -ung, -keit, or -heit, since these endings are typically associated with nominalized verbs or with more ‘abstract’ entities

Read more

Summary

Introduction

The term shell noun refers to the way in which particular general nouns are used to characterize and encapsulate a complex chunk of information for later reference, which might ordinarily be realized by a verb phrase or a sentence (Schmid, 2000). In order to determine which of the extracted candidate pairs constitute actual shell noun instances, I use a series of Naive Bayes classifiers, which make the final decisions as to whether or not a given noun is to be regarded as a shell noun and resolved to some content phrase. In order to help the system recognize nominalized content phrases (e.g. die Möglichkeit der Aktualisierung der Software ‘the opportunity to update the software’), which are especially important for German-language data, I include a number of surface-level features, such as whether or not a lemma ends with -ung, -keit, or -heit, since these endings are typically associated with nominalized verbs or with more ‘abstract’ entities. In the following evaluation, I will assume that 61% of these unannotated cases are false positives, since this is the proportion of negative instances for the nouns in the corpus that are annotated

Evaluation
Related Work
Discussion and Future

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.