More efforts are being put into improving the capabilities of Large Language Models (LLM) than into dealing with their implications. Current LLMs are able to generate high-quality texts seemingly indistinguishable from those written by human experts. While offering great potential, such breakthroughs also pose new challenges for safe and ethical uses of LLMs in education, science, and a multitude of other areas. Thus, majority of current approaches in LLM text detection are either computationally expensive or need access to the LLMs’ internal computations, both of which hinder their public accessibility. With such motivation, this article presents a novel metric learning paradigm for detection of LLM-generated texts that is able to balance computational costs, accessibility, and performances. Specifically, the detection is based on learning a similarity function between a given text and an equivalent example generated by LLMs that outputs high values for LLM-LLM text pairs and low values for LLM-human text pairs. In terms of architecture, the detection framework includes a pre-trained language model for the text embedding task and a newly designed deep metric model. The metric component can be trained on triplets or pairs of same-context instances to signify the distances between human and LLM texts while reducing that among LLM texts. Next, we develop five datasets totaling more than 95,000 contexts and triplets of responses in which one is from humans and two are from GPT-3.5 TURBO or GPT-4 TURBO for benchmarking. Experiment studies show that our best architectures maintain F1 scores between 0.87 and 0.95 across the tested corpora in multiple experiment settings. The metric framework also demands significantly less time in training and inference compared to RoBERTa, LLaMA 3, Mistral v0.3, and Ghostbuster, while keeping 90% to 150% performance of the best benchmark.
Read full abstract