Typological Differences of Natural and Neural Network-Generated Texts in a Quantitative Aspect

R E Telpov,S V Lartsina

doi:10.24224/2227-1295-2023-12-7-47-65

Abstract

The authors of this article identify distinctive features in texts written by humans and texts generated by the GPT-3 neural network. Texts generated by GPT-3 have not yet been subject to systematic in-depth study. In total, 160 texts were analyzed in the article, distributed across four topics (“Higher Education in My Eyes,” “How to Remain Human in Inhuman Conditions,” “How I Spent the Summer,” “Teacher of the Year”), with 80 texts generated by the neural network and 80 texts written by humans. The texts were analyzed using quantitative linguistic methods. A concordance was compiled for each text using the AntConc program, from which quantitative values were obtained for further analysis. The authors reached the following conclusions: (1) in the generated texts, words included in the title occur with the highest frequency; (2) the relative frequency of words included in the title is unreasonably inflated; (3) the list of the 20 most frequent words in all generated texts includes the highest number of full-fledged words; (4) the lexical diversity coefficient in the examined natural texts is significantly higher than that of the generated texts. The findings of this research can be useful for both educators and machine learning specialists.

Full Text