Abstract

Many of the criminal cases analysed by the Prosecution Office of the Federal District and Territories are repetitive and processing them can be streamlined by providing similar previous cases as template. We investigate the use of information retrieval techniques to enable automated identification of similar cases and evaluate if semantic search performs better than lexical search in the task of assisting legal opinion writing. As a proof of concept, syntactic indexing (TF-IDF and BM25) and semantic indexing (Latent Semantic Indexing - LSI and Latent Dirichlet Allocation - LDA) techniques were evaluated using document collections from two public prosecutors offices. In addition, we evaluate model enrichment with the use of recorded data about the cases, and also with the legal norm citations observed in documents. Baseline document collections sampled from full document collection from two public prosecutors offices were used for model evaluation utilizing Normalized Discounted Cumulated Gain (NDCG) as metric. We conclude that there is no significant performance difference between semantic and syntactic indexing techniques. In addition, we observe no significant performance gain with model enrichment. We chose the BM25 technique as more adequate because it has a good balance between performance and simplicity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.