Feature based Entailment Recognition for Malayalam Language Texts

Sara Renjit,Sumam Mary Idicula

doi:10.14569/ijacsa.2022.0130283

Abstract

Textual entailment is a relationship between two text fragments, namely, text/premise and hypothesis. It has applications in question answering systems, multi-document sum-marization, information retrieval systems, and social network analysis. In the era of the digital world, recognizing semantic variability is important in understanding inferences in texts. The texts are either in the form of sentences, posts, tweets, or user experiences. Hence understanding inferences from customer experiences helps companies in customer segmentation. The availability of digital information is ever-growing with textual data in almost all languages, including low resource languages. This work deals with various machine learning approaches applied to textual entailment recognition or natural language inference for Malayalam, a South Indian low resource language. A performance-based analysis using machine learning classification techniques such as Logistic Regression, Decision Tree, Support Vector Machine, Random Forest, AdaBoost, and Naive Bayes is done for the MaNLI (Malayalam Natural Language Inference) dataset. Different lexical and surface-level features are used for this binary and multiclass classification. With the increasing size of the dataset, there is a drop in the performance of feature-based classification. A comparison of feature-based models with deep learning approaches highlights this inference. The main focus here is the feature-based analysis with 14 different features and its comparison, essential to any NLP classification problem.

Highlights

Textual entailment (TE), called natural language inference (NLI) is a relationship between a pair of sentences
Textual entailment for Indo-Aryan languages like Hindi is important to the language community of Northern parts of India. In this attempt we focus on Malayalam language from the Dravidian family
Textual entailment is recognized for the Malayalam language with a feature-based approach

Summary

Introduction

Textual entailment (TE), called natural language inference (NLI) is a relationship between a pair of sentences. It identifies the similarity between the sentences based on their inferential semantic content. The text contradicts the hypothesis if the semantic content of the hypothetical sentence is just opposite to the text. A classical definition for entailment is that a text t entails hypothesis h if h is true in every circumstance of a possible world in which t is true. This definition is too strict in applying to real-world applications. Computable definition for text entailment is provided as hypothesis h is entailed by text t if P(h is true |t) > P(h is true), where P(h is true |t) is the Entailment Confidence [1]

Objectives

Results

Conclusion