Natural Language Inference Task Research Articles

Understanding the biology underpinning the natural regeneration of plant species in order to make plans for effective reforestation is a complex task. This can be aided by providing access to databases that contain long-term and wide-scale geographical information on species distribution, habitat, and reproduction. Although there exists widely-used biodiversity databases that contain structured information on species and their occurrences, such as the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA), the bulk of knowledge about biodiversity still remains embedded in textual documents. Unstructured information can be made more accessible and useful for large-scale studies if there are tools and services that automatically extract meaningful information from text and store it in structured formats, e.g., open biodiversity databases, ready to be consumed for analysis (Thessen et al. 2022). We aim to enrich biodiversity occurrence databases with information on species reproductive condition and habitat, derived from text. In previous work, we developed unsupervised approaches to extract related habitats and their locations, and related reproductive condition and temporal expressions (Gabud and Batista-Navarro 2018). We built a new unsupervised hybrid approach for relation extraction (RE), which is a combination of classical rule-based pattern-matching methods and transformer-based language models that framed our RE task as a natural language inference (NLI) task. Using our hybrid approach for RE, we were able to extract related biodiversity entities from text even without a large training dataset. In this work, we implement an information extraction (IE) pipeline comprised of a named entity recognition (NER) tool and our hybrid relation extraction (RE) tool. The NER tool is a transformer-based language model that was pretrained on scientific text and then fine-tuned using COPIOUS (Conserving Philippine Biodiversity by Understanding big data; Nguyen et al. 2019), a gold standard corpus containing named entities relevant to species occurrence. We applied the NER tool to automatically annotate geographical location, temporal expression and habitat information contained within sentences. A dictionary-based approach is then used to identify mentions of reproductive conditions in text (e.g., phrases such as "fruited heavily" and "mass flowering"). We then use our hybrid RE tool to extract reproductive condition - temporal expression and habitat - geographical location entity pairs. We test our IE pipeline on the forestry compendium available in the CABI Digital Library (Centre for Agricultural and Biosciences International), and show that our work enables the enrichment of descriptive information on reproductive and habitat conditions of species. This work is a step towards enhancing a biodiversity database with the inclusion of habitat and reproductive condition information extracted from text.

Read full abstract

The commonsense natural language inference (CNLI) tasks aim to select the most likely follow-up statement to a contextual description of ordinary, everyday events and facts. Current approaches to transfer learning of CNLI models across tasks require many labeled data from the new task. This paper presents a way to reduce this need for additional annotated training data from the new task by leveraging symbolic knowledge bases, such as ConceptNet. We formulate a teacher-student framework for mixed symbolic-neural reasoning, with the large-scale symbolic knowledge base serving as the teacher and a trained CNLI model as the student. This hybrid distillation process involves two steps. The first step is a symbolic reasoning process. Given a collection of unlabeled data, we use an abductive reasoning framework based on Grenander's pattern theory to create weakly labeled data. Pattern theory is an energy-based graphical probabilistic framework for reasoning among random variables with varying dependency structures. In the second step, the weakly labeled data, along with a fraction of the labeled data, is used to transfer-learn the CNLI model into the new task. The goal is to reduce the fraction of labeled data required. We demonstrate the efficacy of our approach by using three publicly available datasets (OpenBookQA, SWAG, and HellaSWAG) and evaluating three CNLI models (BERT, LSTM, and ESIM) that represent different tasks. We show that, on average, we achieve 63% of the top performance of a fully supervised BERT model with no labeled data. With only 1,000 labeled samples, we can improve this performance to 72%. Interestingly, without training, the teacher mechanism itself has significant inference power. The pattern theory framework achieves 32.7% accuracy on OpenBookQA, outperforming transformer-based models such as GPT (26.6%), GPT-2 (30.2%), and BERT (27.1%) by a significant margin. We demonstrate that the framework can be generalized to successfully train neural CNLI models using knowledge distillation under unsupervised and semi-supervised learning settings. Our results show that it outperforms all unsupervised and weakly supervised baselines and some early supervised approaches, while offering competitive performance with fully supervised baselines. Additionally, we show that the abductive learning framework can be adapted for other downstream tasks, such as unsupervised semantic textual similarity, unsupervised sentiment classification, and zero-shot text classification, without significant modification to the framework. Finally, user studies show that the generated interpretations enhance its explainability by providing key insights into its reasoning mechanism.

Read full abstract

Natural Language Inference Task Research Articles

Related Topics

Articles published on Natural Language Inference Task

Evaluating large language models for user stance detection on X (Twitter)

Evaluating Neural Networks’ Ability to Generalize against Adversarial Attacks in Cross-Lingual Settings

Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species

Data Set Formation Method for Checking the Quality of Learning Language Models of the Transitive Relation in the Logical Conclusion Problem Context

Abductive natural language inference by interactive model with structural loss

Extracting Reproductive Condition and Habitat Information from Text Using a Transformer-based Information Extraction Pipeline

University Student Dropout Prediction Using Pretrained Language Models

Polish natural language inference and factivity: An expert-based dataset and benchmarks

CsFEVER and CTKFacts: acquiring Czech data for fact verification

Improving Biomedical ReQA With Consistent NLI-Transfer and Post-Whitening.

Active Learning for Name Entity Recognition with External Knowledge

Sequence-to-sequence pretraining for a less-resourced Slovenian language.

Curing the SICK and Other NLI Maladies

CLSEP: Contrastive learning of sentence embedding with prompt

Evaluating Deep Learning Techniques for Natural Language Inference

Leveraging Symbolic Knowledge Bases for Commonsense Natural Language Inference Using Pattern Theory.

Parameter-efficient feature-based transfer for paraphrase identification

The VNNLI - VLSP 2021: Leveraging Contextual Word Embedding for NLI Task on Bilingual Dataset

Supervising Model Attention with Human Explanations for Robust Natural Language Inference

Text Data Augmentation for the Korean Language

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Natural Language Inference Task Research Articles

Related Topics

Articles published on Natural Language Inference Task

Evaluating large language models for user stance detection on X (Twitter)

Evaluating Neural Networks’ Ability to Generalize against Adversarial Attacks in Cross-Lingual Settings

Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species

Data Set Formation Method for Checking the Quality of Learning Language Models of the Transitive Relation in the Logical Conclusion Problem Context

Abductive natural language inference by interactive model with structural loss

Extracting Reproductive Condition and Habitat Information from Text Using a Transformer-based Information Extraction Pipeline

University Student Dropout Prediction Using Pretrained Language Models

Polish natural language inference and factivity: An expert-based dataset and benchmarks

CsFEVER and CTKFacts: acquiring Czech data for fact verification

Improving Biomedical ReQA With Consistent NLI-Transfer and Post-Whitening.

Active Learning for Name Entity Recognition with External Knowledge

Sequence-to-sequence pretraining for a less-resourced Slovenian language.

Curing the SICK and Other NLI Maladies

CLSEP: Contrastive learning of sentence embedding with prompt

Evaluating Deep Learning Techniques for Natural Language Inference

Leveraging Symbolic Knowledge Bases for Commonsense Natural Language Inference Using Pattern Theory.

Parameter-efficient feature-based transfer for paraphrase identification

The VNNLI - VLSP 2021: Leveraging Contextual Word Embedding for NLI Task on Bilingual Dataset

Supervising Model Attention with Human Explanations for Robust Natural Language Inference

Text Data Augmentation for the Korean Language