Abstract

While traditional supervised learning methods perform classification based only on the physical features of the data (e.g. distribution, similarity or distance), the high-level classification is characterized by its ability to capture topological features of the input data by using complex network measures. Recent works have shown that a variety of patterns can be detected by combining both features of the data, although the physical features alone are unable to uncover them. In this article we investigate such a hybrid method for the Semantic Role Labeling (SRL) task, which consists of the identification and classification of arguments in a sentence with roles that indicate semantic relations between an event and its participants. Due to its potential to improve many other natural language processing tasks, such as information extraction and plagiarism detection to name a few, we consider the SRL task over a Brazilian Portuguese corpus named PropBank-br, which was built with texts from Brazilian newspapers. Such a corpus represents a challenging classification problem as it suffers with the scarcity of annotated data and very imbalanced distributions, like the majority of non-English corpus. Experiments were performed considering the argument classification task over the whole corpus and, specifically, over the most frequent verbs. Results in the verb-specific scenario revealed that the high-level system is able to obtain a considerable gain in terms of predictive performance, even over a state-of-the-art algorithm for SRL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call