Abstract

This work introduces a new information extraction task called <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Topic-Oriented Fragment Recognition (TOFR)</b> , whose goal is to recognize information related to a specific topic in long documents from professional fields. In this paper, we introduce two TOFR datasets to study the problems of processing long documents. We propose a novel neural framework named <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Zooming Network (ZoomNet)</b> , which overcomes the challenge of combining information over long distances with limited computing resources by flexibly switching between skimming and intensive reading in processing long documents. In general, ZoomNet first establishes a hierarchical representation aligned to the text structure, which relieves the conflict between local information and extensive contextual information. Then, it synthesizes different levels of information to assign tags via multi-scale actions. We combine supervised and reinforcement learning methods to train our model. Experiments show that the proposed model outperforms several state-of-the-art sequence labeling models, including BiLSTM-CRF, BERT, XLNET, RoBERTa, and ELECTRA, on both TOFR datasets with big margins.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call