Text Classification in Law Area: a Systematic Review

V S Martins,C D Silva

doi:10.5753/kdmile.2021.17458

Abstract

Automatic Text Classification represents a great improvement in law area workflow, mainly in the migration of physical to electronic lawsuits. A systematic review of studies on text classification in law area from January 2017 up to February 2020 was conducted. The search strategy identified 20 studies, that were analyzed and compared. The review investigates from research questions: what are the state-of-art language models, its application of text classification in English and Brazilian Portuguese datasets from legal area, if there are available language models trained on Brazilian Portuguese, and datasets in Brazilian law area. It concludes that there are applications of automatic text classification in Brazil, although there is a gap on the use of language models when compared with English language dataset studies, also the importance of language model in domain pre-training to improve results, as well as there are two studies making available Brazilian Portuguese language models, and one introducing a dataset in Brazilian law area.

Full Text