COLIEE 2020: Legal Information Retrieval and Entailment with Legal Embeddings and Boosting

Houda Alberts,Phillip Wozny,Roderick Lucas,Akin Ipek

doi:10.1007/978-3-030-79942-7_14

Abstract

AbstractIn this paper we investigate three different methods for several legal document retrieval and entailment tasks; namely, new low complexity pre-trained embeddings, specifically trained on documents in the legal domain, transformer models and boosting algorithms. Task 1, a case law retrieval task, utilized a pairwise CatBoost resulting in an F1 score of .04. Task 2, a case law entailment task, utilized a combination of BM25+, embeddings and natural language inference (NLI) features winning third place with an F1 of 0.6180. Task 3, a statutory information retrieval task, utilized the aforementioned pre-trained embeddings in combination with TF-IDF features resulting in an F2 score of 0.4546. Lastly, task 4, a statutory entailment task, utilized BERT embeddings with XGBoost and achieved an accuracy of 0.5357. Notably, our Task 2 submission was the third best in the competition. Our findings illustrate that using legal embeddings and auxiliary linguistic features, such as NLI, show the most promise for future improvements.KeywordsLegal information retrievalTextual entailmentClassificationNatural language inferenceRankingLegal embeddingsBERTBoosting

Full Text