Abstract

Ungrammatical sentences present a challenge in a number of Natural Language Processing tasks, including those used in automatic Question Answering. In this paper, we introduce an algorithm that identifies the most likely decomposition of a (possibly ungrammatical) sentence into its semantic roles. The algorithm makes use of a chart parser - using a “tight” hybrid syntactic-semantic context-free grammar - that identifies whether each substring may play the role of either a main or a subordinate clause (like a declarative clause), or a semantic role like subject, predicate or complements. Then an Integer Programming Problem is solved in order to find a coverage of maximum likelihood. At this stage, the model tries to partition the sentence in substrings in such a way that: (a) each substring is assigned a clause (main or a secondary clause) and a semantic role; a measure of the overall likelihood is maximized. The validity of this approach has been assessed on a testset obtained by randomly perturbing a set of grammatical sentences of various nature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call