Abstract

Large Language Models (LLMs) face limitations in logical reasoning, which restrict their applicability in critical domains such as law. Current evaluation methods often lead to inaccurate assessments of LLMs’ capabilities due to their simplicity. This paper presents a refined evaluation method for assessing LLMs’ capability to answer legal questions by eliminating the possibility of obtaining correct responses by chance. Furthermore, we introduce the LogiLaw dataset, which aims to enhance the models’ logical reasoning capacities in general and legal reasoning specifically. By leveraging the refined evaluation technique, the LogiLaw dataset, and the proposed Reinforcement Learning from Logical Feedback (RLLF) approach, our work aims to open new avenues for research to bolster LLMs’ performance in law and other logic-intensive disciplines while addressing the shortcomings of conventional evaluation approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call