A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers

Mohamed Hesham Ibrahim Abdalla,Daryna Dementieva,Simon Malberg,Georg Groh,Edoardo Mosca

doi:10.3390/info14100522

Mohamed Hesham Ibrahim Abdalla, Daryna Dementieva + Show 3 more

Open Access

https://doi.org/10.3390/info14100522

Copy DOI

Journal: Information	Publication Date: Sep 26, 2023
Citations: 3	License type: CC BY 4.0

Affiliation: Technical University of Munich

Abstract

As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, we introduce a novel benchmark dataset containing human-written and machine-generated scientific papers from SCIgen, GPT-2, GPT-3, ChatGPT, and Galactica, as well as papers co-created by humans and ChatGPT. We also experiment with several types of classifiers—linguistic-based and transformer-based—for detecting the authorship of scientific text. A strong focus is put on generalization capabilities and explainability to highlight the strengths and weaknesses of these detectors. Our work makes an important step towards creating more robust methods for distinguishing between human-written and machine-generated scientific papers, ultimately ensuring the integrity of scientific literature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers

Abstract

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Cross-Domain Health Conditions Identification Based on Joint Distribution Modeling of Fused Prototypes
Seung-Kyum Choi ... Xufeng Huang
-
Seung-Kyum Choi, et. al.Seung-Kyum Choi ... Xufeng Huang
20 Aug 2023
20 Aug 2023

32 Capabilities of Highly Effective People in Any Field: Towards Defining Customer Requirements for Education Institutions, Corporate Universities, and Personal Careers
Richard Tabor Greene
SSRN Electronic Journal | VOL. -
Richard Tabor GreeneRichard Tabor Greene
03 Apr 2013
SSRN Electronic Journal | VOL. -

Technology Transfer Models Between Industrial Biotechnology Companies and Academic Spin-Offs
Gunter Festel
Industrial Biotechnology | VOL. 9
Gunter FestelGunter Festel
01 Oct 2013
Industrial Biotechnology | VOL. 9

Pairing support vector algorithm for data regression
Pei-Yi Hao
Neurocomputing | VOL. 225
Pei-Yi HaoPei-Yi Hao
17 Nov 2016
Neurocomputing | VOL. 225

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers

Abstract

Talk to us

Similar Papers

More From: Information