More efforts are being put into improving Large Language Models’ (LLM) capabilities than into dealing with their implications. Current LLMs are able to generate high quality texts seemingly indistinguishable from those written by human experts. While offering great potentials, such breakthroughs also pose new challenges for safe and ethical uses of LLMs in education, science, and a multitude of other areas. To add up, the majority of current approaches in LLM text detection are either computationally expensive or need access to the LLMs’ internal computations, both of which hinder their public accessibility. With such motivation, this paper presents a novel metric learning paradigm for detection of LLM-generated texts that is able to balance among computational costs, accessibility, and performances. Specifically, the detection is based on learning a similarity function between a given text and an equivalent example generated by LLMs that outputs high values for LLM-LLM text pairs and low values for LLM-human text pairs. In terms of architecture, the detection framework includes a pretrained language model for the text embedding task and a newly designed deep metric model. The metric component can be trained on triplets or pairs of same-context instances to signify the distances between human texts and LLM ones while reducing that among LLM texts. Next, we develop five datasets totalling over 95,000 contexts and triplets of responses in which one from human and two from GPT-3.5 TURBO or GPT-4 TURBO for benchmarking. Experiment studies show that our best architectures maintain F1 scores in between 0.87 to 0.95 across the tested corpora in multiple experiment settings. The metric framework also demands significantly less time in training and inference compared to the RoBERTa, LLaMA 3, Mistral v0.3, and Ghostbuster, while keeping 90% to 150% performances of the best benchmark.
Read full abstract