Exploring the potential of large language models for author profiling tasks in digital text forensics

Sang-Hyun Cho,Dohyun Kim,Hyuk-Chul Kwon,Minho Kim

doi:10.1016/j.fsidi.2024.301814

Abstract

The rapid advancement of large language models (LLMs) has opened up new possibilities for various natural language processing tasks. This study explores the potential of LLMs for author profiling in digital text forensics, which involves identifying characteristics such as age and gender from writing style—a crucial task in forensic investigations of anonymous or pseudonymous communications. Experiments were conducted using state-of-the-art LLMs, including Polyglot, EEVE, and Bllossom, to evaluate their performance in author profiling. Different fine-tuning strategies, such as full fine-tuning, Low-Rank Adaptation (LoRA), and Quantized LoRA (QLoRA), were compared to determine the most effective methods for adapting LLMs to the specific needs of this task. The results show that fine-tuned LLMs can effectively predict authors’ age and gender based on their writing styles, with Polyglot-based models generally outperforming EEVE and Bllossom models. Additionally, LoRA and QLoRA strategies significantly reduce computational costs and memory requirements while maintaining performance comparable to full fine-tuning. However, error analysis reveals limitations in the current LLM-based approach, including difficulty in capturing subtle linguistic variations across age groups and potential biases from pre-training data. These challenges are discussed and future research directions to address them are proposed. This study underscores the potential of LLMs in author profiling for digital text forensics, suggesting promising avenues for further exploration and refinement.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Exploring the potential of large language models for author profiling tasks in digital text forensics

Abstract

Published Version

Talk to us

Similar Papers

More From: Forensic Science International: Digital Investigation

Lead the way for us

Journal: Forensic Science International: Digital Investigation	Publication Date: Oct 1, 2024
License type: cc-by-nc-nd

Similar Papers

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang ... Haoming Jiang
ACM Transactions on Knowledge Discovery from Data | VOL. 18
Jingfeng Yang, et. al.Jingfeng Yang ... Haoming Jiang
26 Apr 2024
ACM Transactions on Knowledge Discovery from Data | VOL. 18

BB-GeoGPT: A framework for learning a large language model for geographic information science
Yifan Zhang ... Wenhao Yu
Information Processing and Management | VOL. 61
Yifan Zhang, et. al.Yifan Zhang ... Wenhao Yu
22 Jun 2024
Information Processing and Management | VOL. 61

A Large and Diverse Arabic Corpus for Language Modeling
Abbas Raza Ali ... Hasan Raza Ali
Procedia Computer Science | VOL. 225
Abbas Raza Ali, et. al.Abbas Raza Ali ... Hasan Raza Ali
01 Jan 2023
Procedia Computer Science | VOL. 225

Use of SNOMED CT in Large Language Models: Scoping Review.
Eunsuk Chang ... Sumi Sung
JMIR medical informatics | VOL. 12
Eunsuk Chang, et. al.Eunsuk Chang ... Sumi Sung
07 Oct 2024
JMIR medical informatics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Exploring the potential of large language models for author profiling tasks in digital text forensics

Abstract

Published Version

Talk to us

Similar Papers

More From: Forensic Science International: Digital Investigation