Abstract

In the judicial field, with the increase of legal text data, the extraction of legal text elements plays a more and more important role. In this paper, we propose a sentence-level model of legal text element extraction based on the structure of multilabel text classification. Our proposed model contains an encoder and an improved decoder. The encoder applies multilevel convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) as feature extraction networks to extract local neighborhood and context information from legal text, and a decoder applies LSTM with multiattention and full connection layer with an improved initialization method to decode and generate label sequences. To our best knowledge, it is one of the first attempts to apply a multilabel classification algorithm for element extraction of legal text. In order to verify the effectiveness of our model, we conduct experiments not only on three real legal text datasets but also on a general multilabel text classification dataset.The experimental results demonstrate that our proposed model outperforms baseline models on legal text datasets, and our model is competitive to baseline models on the general text multilabel classification dataset, which indicates that our proposed model is useful for multilabel classification tasks of ordinary texts and legal texts with an uncertain number of characters in words and short lengths.

Highlights

  • With the development of the economy, there are more and more civil legal disputes, so that legal practitioners have to deal with more and more legal texts; the number of legal practitioners has not expanded with the increase in the number of documents

  • We propose a model based on multilevel convolutional network for sentence-level element extraction of legal text

  • Our proposed model can combine textual local semantic information obtained by the multilevel convolutional neural networks (CNN) and context information obtained by Long Short-Term Memory (LSTM) to generate higher level semantic representation of sentences by applying multiattention network

Read more

Summary

Introduction

With the development of the economy, there are more and more civil legal disputes, so that legal practitioners have to deal with more and more legal texts; the number of legal practitioners has not expanded with the increase in the number of documents. To alleviate the contradiction between the large number of cases and the small number of legal practitioners in China’s judicial field in recent years and to improve the work efficiency of legal practitioners, it is necessary to use automated extraction technology to extract text sentence elements from legal texts to help legal practitioners understand important information in texts quickly. There are relatively few researches about the element extraction of legal texts. The extraction of legal text elements is defined as the assignment of labels with specific legal attributes to each sentence in the legal text according to the semantic information it represents. The element extraction of legal text can be regarded as the multilabel classification (MLC) problem of texts rather than a multiclass classification (MCC) problem

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call