Multi-view overlapping clustering for the identification of the subject matter of legal judgments

Graziella De Martino,Gianvito Pio,Michelangelo Ceci

doi:10.1016/j.ins.2023.118956

Abstract

The legal field is generally burdened by paper-heavy activities, and the management of massive amounts of legal judgments without the adoption of computational tools may compromise the effectiveness and efficiency of administration processes. In this paper, we propose MOSTA, a novel unsupervised method to support the automated identification of groups of legal judgments with similar characteristics, with the goal of reducing the manual effort necessary for the management of legal judgments.Methodologically, MOSTA learns two different embedding models for legal judgments. The first aims to represent the semantics of the textual content, while the second aims to represent co-citations of legal acts, also considering the granularity of the citations. Such representations are then fused through a multi-view approach based on an autoencoder, and the obtained representation is finally exploited by a novel overlapping clustering algorithm. The latter is an additional strong point of MOSTA, since, contrary to existing approaches, does not rely on additional input parameters that inherently influence the degree of overlap of the resulting clusters.Our experiments, performed on three textual datasets, including a real-world legal dataset provided by EUR-Lex, proved that the proposed representation of cited legal acts, the adopted multi-view fusion strategy, and the novel overlapping clustering algorithm implemented in MOSTA provide a positive contribution to the quality of the identified clusters. Finally, MOSTA demonstrated to be able to outperform by a great margin existing complete solutions based on fine-tuned BERT embedding models and existing overlapping clustering algorithms.

Full Text