WITHDRAWN: Evaluating the generation capabilities of large Chinese language models

Hui Zeng,Jingyuan Xue,Meng Hao,Chen Sun,Bin Ning,Na Zhang

doi:10.1016/j.aiopen.2024.02.002

Abstract

This paper unveils CG-Eval, the first-ever comprehensive and automated evaluation framework designed for assessing the generative capabilities of large Chinese language models across a spectrum of academic disciplines. CG-Eval stands out for its automated process, which critically assesses models based on their proficiency in generating precise and contextually relevant responses to a diverse array of questions within six key domains: Science and Engineering, Humanities and Social Sciences, Mathematical Calculations, Medical Practitioner Qualification Examination, Judicial Examination, and Certified Public Accountant Examination. Alongside this, we introduce Gscore, an innovative composite index developed from a weighted sum of multiple metrics. Gscore uniquely automates the quality measurement of a model's text generation against reference standards, providing a detailed and nuanced assessment of model performance. This automation not only enhances the efficiency and scalability of the evaluation process but also ensures objective and consistent assessment across various models. The detailed test data and results, highlighting the robust capabilities and comparative performance of the evaluated models, are accessible at http://cgeval.besteasy.com/.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

WITHDRAWN: Evaluating the generation capabilities of large Chinese language models

Abstract

Published Version

Talk to us

Similar Papers

More From: AI Open

Lead the way for us

Journal: AI Open	Publication Date: Mar 1, 2024
License type: cc-by-nc-nd

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... W Nick Street
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... W Nick Street
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Use of SNOMED CT in Large Language Models: Scoping Review.
Eunsuk Chang ... Sumi Sung
JMIR medical informatics | VOL. 12
Eunsuk Chang, et. al.Eunsuk Chang ... Sumi Sung
07 Oct 2024
JMIR medical informatics | VOL. 12

A Survey on Evaluation of Large Language Models
Yupeng Chang ... Philip S Yu
ACM Transactions on Intelligent Systems and Technology | VOL. 15
Yupeng Chang, et. al.Yupeng Chang ... Philip S Yu
29 Mar 2024
ACM Transactions on Intelligent Systems and Technology | VOL. 15

A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly
Yifan Yao ... Yue Zhang
High-Confidence Computing | VOL. 4
Yifan Yao, et. al.Yifan Yao ... Yue Zhang
01 Mar 2024
High-Confidence Computing | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

WITHDRAWN: Evaluating the generation capabilities of large Chinese language models

Abstract

Published Version

Talk to us

Similar Papers

More From: AI Open