Abstract

Text summarization has a very high applicability in legal domain due to their complex nature. There are several classical algorithms that can show promising results on legal documents. One such algorithm is Kullback-Leibler based summarization (KLSumm), where documents and candidate summaries are represented through unigram probability distributions and summary sentences are chosen based on minimizing KL-divergence between documents and candidate summaries. The choice of probability distribution has a great impact on choosing the summary sentences from the document. In this work, two approaches are explored for improving the formation of probability distributions, viz. NgramKLSumm and BertKLSumm. From the experimental results, we find that, NgramKLSumm approach performs better on the BillSum dataset in terms of both ROUGE and BERTScore metrics; whereas BertKLSumm performs better on the FIRE dataset in terms of ROUGE metric. In the case of BillSum dataset, the improvement is around 5-13% for BertKLSumm approach; whereas nearly 10-16% improvement is seen for NgramKLSumm approach in terms of ROUGE metrics. In terms of BertScore, the improvement is around 2% F1-score for BertKLSumm approach, while 6% F1-score improvement is seen in the case of NgramKLSumm approach. In the case of FIRE dataset, 2-5% improvement is seen for the ROUGE metrics while no improvement is seen in the case of BertScore. From these results it is clear that with an enhanced representation of documents and candidate summaries, it is possible to obtain great improvements across different datasets as well as evaluation metrics, thereby improving the baseline KLSumm approach for legal document summarization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.