Abstract

In legal domain Name Entity Recognition serves as the basis for subsequent stages of legal artificial intelligence. In this paper, the authors have developed a dataset for training Name Entity Recognition (NER) in the Indian legal domain. As a first step of the research methodology study is done to identify and establish more legal entities than commonly used named entities such as person, organization, location, and so on. The annotators can make use of these entities to annotate different types of legal documents. Variety of text annotation tools are in existence finding the best one is a difficult task, so authors have experimented with various tools before settling on the best one for this research work. The resulting annotations from unstructured text can be stored into a JavaScript Object Notation (JSON) format which improves data readability and manipulation simple. After annotation, the resulting dataset contains approximately 30 documents and approximately 5000 sentences. This data further used to train a spacy pre-trained pipeline to predict accurate legal name entities. The accuracy of legal names can be increased further if the pre-trained models are fine-tuned using legal texts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call