Recent advancement in Large Language Models (LLMs) has opened the prospect of generating text for social media content that mimics human writing. The misuse of these tools presents urgent dilemmas, motivating the need to better understand the structure and patterns of LLM-generated content. Human communication on the Internet has developed relevant linguistic adaptations, including the use of emoji to augment traditional text. This study investigates the ability of one LLM, OpenAI’s GPT-3.5, to replicate human emoji usage in social media contexts. Drawing upon a dataset of nearly three thousand US English human-written tweets, we employed GPT-3.5-Turbo to generate social-media-style content and analyzed the use of emoji in the resulting text. We compared the patterns of emoji usage between the LLM-generated and the human-written datasets, particularly frequency, types of emoji commonly used, and emoji sequences (n-grams). Our results revealed notable differences in all categories. While human-written tweets were more likely to use faces, hearts, and repetitive sequences of emoji, LLM-created content had a broader variety of emoji, with a preference for literal representations of the text’s subject matter, producing diverse and unique emoji combinations.
Read full abstract