Abstract

The development of artificial intelligence (AI) and the increased curiosity about how large language models (LLMs) may maximize an organization's opportunities and the ethical implications of LLMs, such as the ability to generate human-like text, give rise to concerns regarding disinformation and fake news. As a result, it is crucial to develop evaluation benchmarks that take into account the social and ethical implications involved. The great challenges of LLMs lack awareness of their own limitations, yet they persist in producing responses to the best of their capabilities. This often results in seemingly plausible but ultimately incorrect answers, posing challenges to the implementation of reliable generative AI in industry. This paper aims to delve into the evaluation metrics of machine-learning models' performance, specifically focusing on LLM. Therefore, bibliometric analysis utilized to explore and analyze various techniques and methods used in evaluating large language models. Additionally, it sheds light on the specific areas of focus when evaluating these models. The results show that natural language processing systems, classification of information, and computational linguistics are some of the techniques used to evaluate large language models. This work paves the way for future investigations employing extensive language models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.