Abstract

Computing the similarity between two legal documents is an important challenge in the Legal Information Retrieval domain. Efficient calculation of this similarity has useful applications in various tasks such as identifying relevant prior cases for a given case document. Prior works have proposed network-based and text-based methods for measuring similarity between legal documents. However, there are certain limitations in the prior methods. Network-based measures are not always meaningfully applicable since legal citation networks are usually very sparse. On the other hand, only primitive text-based similarity measures, such as TF-IDF based approaches, have been tried till date. In this work, we focus on improving text-based methodologies for computing the similarity between two legal documents. In addition to TF-IDF based measures, we use advanced similarity measures (such as topic modeling) and neural network models (such as word embeddings and document embeddings). We perform extensive experiments on a large dataset of Indian Supreme Court cases, and compare among various methodologies for measuring the textual similarity of legal documents. Our experiments show that embedding based approaches perform better than other approaches. We also demonstrate that the proposed embedding-based methodologies significantly outperforms a baseline hybrid methodology involving both network-based and text-based similarity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call