How Well Do Simulated Population Samples with GPT-4 Align with Real Ones? The Case of the Eysenck Personality Questionnaire Revised-Abbreviated Personality Test.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Background: Advances in artificial intelligence have enabled the simulation of human-like behaviors, raising the possibility of using large language models (LLMs) to generate synthetic population samples for research purposes, which may be particularly useful in health and social sciences. Methods: This paper explores the potential of LLMs to simulate population samples mirroring real ones, as well as the feasibility of using personality questionnaires to assess the personality of LLMs. To advance in that direction, 2 experiments were conducted with GPT-4o using the Eysenck Personality Questionnaire Revised-Abbreviated (EPQR-A) in 6 languages: Spanish, English, Slovak, Hebrew, Portuguese, and Turkish. Results: We find that GPT-4o exhibits distinct personality traits, which vary based on parameter settings and the language of the questionnaire. While the model shows promising trends in reflecting certain personality traits and differences across gender and academic fields, discrepancies between the synthetic populations' responses and those from real populations remain. Conclusions: These inconsistencies suggest that creating fully reliable synthetic population samples for questionnaire testing is still an open challenge. Further research is required to better align synthetic and real population behaviors.

Similar Papers
  • Research Article
  • 10.1145/3712301
Matching GPT-simulated Populations with Real Ones in Psychological Studies—The Case of the EPQR-A Personality Test
  • Apr 25, 2025
  • ACM Transactions on Computing for Healthcare
  • Gregorio Ferreira + 3 more

This article analyzes how well OpenAI’s LLM GPT-4 can emulate different personalities and simulate populations to answer psychological questionnaires similarly to real population samples. For this purpose, we performed different experiments with the Eysenck Personality Questionnaire-Revised Abbreviated (EPQR-A) in three different languages (Spanish, English, and Slovak). The EPQR-A measures personality on four scales: extraversion (E: sociability), neuroticism (N: emotional stability), psychoticism (P: tendency to break social rules, and not having empathy), and lying (L: social desirability). We perform a comparative analysis of the answers of synthetic populations with those of two real population samples of Spanish students as well as the unconditioned baseline personality of GPT. Furthermore, the impact of time (what year the questionnaire is answered), questionnaire language, and student age and gender are analyzed. To our knowledge, this is the first time the EPQR-A test has been used to assess the GPT´s personality and the impact of different language versions and time are measured. Our analysis reveals that GPT-4 exhibits an extroverted, emotionally stable personality with low psychoticism levels and high social desirability. GPT-4 replicates some differences observed in real populations in terms of gender but only partially replicates the results for real populations.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/jcsse.2016.7748838
Generating synthetic population at individual and household levels with aggregate data
  • Jul 1, 2016
  • Natthaporn Watthanasutthi + 1 more

Population synthesis is a process to create data records of individual persons and households with associated attributes that closely resemble the real population. It is the basis of microsimulation models for various applications such as urban planning, crime modeling and epidemiology. This work aims to create synthetic Thai population at the provincial scale. Our synthetic population generator is based on the synthetic reconstruction method, which is most suitable where only aggregate census data are available, as in Thailand. With available census tabulations from various government agencies, the generator is configured to combine 16 tabulation data at individual and household levels using conditional probabilities. The order of conditional probabilities is designed according to dependencies between the attributes and the difference in resolutions of the data from multiple sources. The main contribution of this work is the method to generate complex household types. Many family related attributes are used to create family relationships among individuals. Then, families and individuals are assigned into households according to household statistics. The generator is evaluated by creating synthetic population of Phitsanulok, a province with 835,555 individuals and 296,807 households localized in 18 municipality areas of 9 districts. The aggregated tabulations of the synthetic population are compared to the original ones. The results show that the distributions of their aggregated attributes are very close to the source data. Therefore, the synthetic population is a good approximate of the real population.

  • Research Article
  • Cite Count Icon 8
  • 10.1287/ijds.2023.0007
How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
  • Apr 1, 2023
  • INFORMS Journal on Data Science
  • Galit Shmueli + 7 more

How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?

  • Research Article
  • 10.1111/j.1365-2966.2012.20198.x
Exposing the hidden white dwarf binary origin by means of a synthetic population model for nearby single stars
  • Jan 27, 2012
  • Monthly Notices of the Royal Astronomical Society
  • S A Dawson + 1 more

We present a new synthetic population model for local thin-disc (d≤ 100 pc) single-star white dwarfs (WDs), including effective temperature and mass distribution. The only two parameters of the synthetic population model are the initial mass function (IMF) and the star-formation rate (SFR). Depletion losses through kinematic heating of the stellar ‘gas’ vertical to the Galactic plane are prescribed empirically using the observed local velocity dispersion as a function of age. We apply the same SFR and IMF for the WD population model as previously determined from a study based upon the latest Hipparcos and binary catalogue data to yield a matching synthetic single-star population. A striking result of comparing the synthetic WD population with the complete local observed sample (with d < 13 pc) is the excellent agreement between the absolute number of WDs when binary stars are not excluded from the empirical basis used to calibrate our synthetic population. When looking at the total expected WD number after a rigorous accounting for binary stars, we see that this is significantly lower than the corresponding observed WD number. Hence, many of these apparently single WDs must have a hidden binary-system history, i.e. some may be end products of binary mergers or of mass overflows. We suspect that many of the WDs exist in hidden double-degenerate systems. There is good agreement between the temperature distribution of our synthetic WD sample and observations, as well as between our synthetic mass-distribution peak (at 0.67 M⊙) and the one recently observed (0.65 M⊙). Remarkably, both values are about 0.06 M⊙ higher than those stated by earlier studies. In the case of our synthetic sample, older stars of lower mass experience a greater amount of dynamic depletion and the remaining local WDs within the sample tend to be slightly more massive. The small, remaining discrepancy may be explained by the stated contamination by WDs of binary origin.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.trpro.2016.02.078
Freight Activity Chain Generation Using Complex Networks of Connectivity
  • Jan 1, 2016
  • Transportation Research Procedia
  • Johan W Joubert + 1 more

Freight Activity Chain Generation Using Complex Networks of Connectivity

  • Research Article
  • Cite Count Icon 1
  • 10.1093/mnras/stae2055
The VISCACHA Survey – XI. Benchmarking SIESTA: a new synthetic CMD fitting code
  • Aug 31, 2024
  • Monthly Notices of the Royal Astronomical Society
  • Bernardo P L Ferreira + 18 more

We present a novel code, named SIESTA (Statistical matchIng between rEal and Synthetic sTellar popuLations), designed for performing statistical isochrone fitting to colour–magnitude diagrams (CMDs) of single stellar populations by leveraging comparisons between the observed stellar distribution and predictions from synthetic populations, simulated on top of a grid of isochrones. These synthetic populations encompass determinant factors such as the cluster’s initial mass function (IMF), the presence of non-resolved binaries, as well as the expected photometric errors, and observational completeness (or the observed luminosity function). Employing Markov Chain Monte Carlo within a Bayesian framework, SIESTA allows for the determination of a cluster’s age, metallicity, distance, colour excess, and binary fraction (with masses exceeding a certain ratio). In this study, we rigorously benchmark the SIESTA code utilizing synthetic populations and evaluate its performance against observations from the VISCACHA Survey in the Small Magellanic Cloud, focusing on five star clusters: Lindsay 114, NGC 152, Lindsay 91, Lindsay 113, and NGC 121. These clusters were chosen for their diverse age range, spanning from 0.04 to 10 Gyr. Our findings demonstrate the capability of the SIESTA code to accurately represent the observed CMDs of these clusters. Furthermore, we compare the results obtained with SIESTA to previous characterizations of these clusters, highlighting the consistency between the derived metallicity and spectroscopic determinations from various sources.

  • Research Article
  • Cite Count Icon 2
  • 10.2139/ssrn.2086345
Generating a Close-to-Reality Synthetic Population of Ghana
  • Jun 18, 2012
  • SSRN Electronic Journal
  • Tyler J Frazier + 1 more

The purpose of this research is to generate a close-to-reality synthetic human population for use in a geosimulation of urban dynamics. Two commonly accepted approaches to generating synthetic human populations are Iterative Proportional Fitting (IPF) and Resampling with Replacement. While these methods are effective at reproducing one instance of the probability model describing the survey, it is an instance with extremely small variability amongst subgroups and is very unlikely to be the real population. IPF and Resampling with Replacement also rely on pure replication of units from the underlying sample which can increase unrealistic model behavior. In this work we present a sequential logic for estimating variables using multinomial logistic regressions and the conditional probabilities amongst each variable in order to generate combinations which were not represented in the original survey but are likely to occur in the real population. We also present a model based approach to imputing missing observation responses and apply the methodology to the Ghana Living Standard Survey 5 (GLSS5) in order to generate a comprehensive synthetic population for the Republic of Ghana, including such household and person variables as household size, tribal affiliation, educational attainment and annual income, amongst others. The R language and environment for statistical computing was used as well as the packages VIM and simPopulation in developing and executing the code. Contingency coefficients, cumulative distributions, mosaic plots, and box plots are presented for evaluation in order to demonstrate the effectiveness of the new method in its application to Ghana.

  • Research Article
  • Cite Count Icon 16
  • 10.1162/daed_e_01897
Getting AI Right: Introductory Notes on AI &amp; Society
  • May 1, 2022
  • Daedalus
  • James Manyika

Getting AI Right: Introductory Notes on AI &amp; Society

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.jbi.2020.103408
Empirically-derived synthetic populations to mitigate small sample sizes.
  • Mar 12, 2020
  • Journal of Biomedical Informatics
  • Erin E Fowler + 5 more

Empirically-derived synthetic populations to mitigate small sample sizes.

  • Research Article
  • 10.2196/65729
Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation.
  • Mar 20, 2025
  • JMIR AI
  • Marko Miletic + 1 more

Recent advancements in Generative Adversarial Networks and large language models (LLMs) have significantly advanced the synthesis and augmentation of medical data. These and other deep learning-based methods offer promising potential for generating high-quality, realistic datasets crucial for improving machine learning applications in health care, particularly in contexts where data privacy and availability are limiting factors. However, challenges remain in accurately capturing the complex associations inherent in medical datasets. This study evaluates the effectiveness of various Synthetic Data Generation (SDG) methods in replicating the correlation structures inherent in real medical datasets. In addition, it examines their performance in downstream tasks using Random Forests (RFs) as the benchmark model. To provide a comprehensive analysis, alternative models such as eXtreme Gradient Boosting and Gated Additive Tree Ensembles are also considered. We compare the following SDG approaches: Synthetic Populations in R (synthpop), copula, copulagan, Conditional Tabular Generative Adversarial Network (ctgan), tabular variational autoencoder (tvae), and tabula for LLMs. We evaluated synthetic data generation methods using both real-world and simulated datasets. Simulated data consist of 10 Gaussian variables and one binary target variable with varying correlation structures, generated via Cholesky decomposition. Real-world datasets include the body performance dataset with 13,393 samples for fitness classification, the Wisconsin Breast Cancer dataset with 569 samples for tumor diagnosis, and the diabetes dataset with 768 samples for diabetes prediction. Data quality is evaluated by comparing correlation matrices, the propensity score mean-squared error (pMSE) for general utility, and F1-scores for downstream tasks as a specific utility metric, using training on synthetic data and testing on real data. Our simulation study, supplemented with real-world data analyses, shows that the statistical methods copula and synthpop consistently outperform deep learning approaches across various sample sizes and correlation complexities, with synthpop being the most effective. Deep learning methods, including large LLMs, show mixed performance, particularly with smaller datasets or limited training epochs. LLMs often struggle to replicate numerical dependencies effectively. In contrast, methods like tvae with 10,000 epochs perform comparably well. On the body performance dataset, copulagan achieves the best performance in terms of pMSE. The results also highlight that model utility depends more on the relative correlations between features and the target variable than on the absolute magnitude of correlation matrix differences. Statistical methods, particularly synthpop, demonstrate superior robustness and utility preservation for synthetic tabular data compared with deep learning approaches. Copula methods show potential but face limitations with integer variables. Deep Learning methods underperform in this context. Overall, these findings underscore the dominance of statistical methods for synthetic data generation for tabular data, while highlighting the niche potential of deep learning approaches for highly complex datasets, provided adequate resources and tuning.

  • Discussion
  • 10.1111/cogs.13430
Large Language Models: A Historical and Sociocultural Perspective.
  • Mar 1, 2024
  • Cognitive science
  • Eugene Yu Ji

This letter explores the intricate historical and contemporary links between large language models (LLMs) and cognitive science through the lens of information theory, statistical language models, and socioanthropological linguistic theories. The emergence of LLMs highlights the enduring significance of information-based and statistical learning theories in understanding human communication. These theories, initially proposed in the mid-20th century, offered a visionary framework for integrating computational science, social sciences, and humanities, which nonetheless was not fully fulfilled at that time. The subsequent development of sociolinguistics and linguistic anthropology, especially since the 1970s, provided critical perspectives and empirical methods that both challenged and enriched this framework. This letter proposes that two pivotal concepts derived from this development, metapragmatic function and indexicality, offer a fruitful theoretical perspective for integrating the semantic, textual, and pragmatic, contextual dimensions of communication, an amalgamation that contemporary LLMs have yet to fully achieve. The author believes that contemporary cognitive science is at a crucial crossroads, where fostering interdisciplinary dialogues among computational linguistics, social linguistics and linguistic anthropology, and cognitive and social psychology is in particular imperative. Such collaboration is vital to bridge the computational, cognitive, and sociocultural aspects of human communication and human-AI interaction, especially in the era of large language and multimodal models and human-centric Artificial Intelligence (AI).

  • Conference Article
  • 10.5555/2484920.2485267
Towards "live" synthetic populations for large-scale realistic multiagent simulations
  • May 6, 2013
  • Nidhi Parikh

Synthetic populations attempt to capture population dynamics of a geographic region and hence are widely used in large-scale multiagent applications simulating real-world phenomena. However, current synthetic populations are mostly static - individuals are assumed to perform same daily routine every day. My thesis aims at taking the first step towards making it a live synthetic population that would update automatically to reflect changes in the real population, by incorporating information from social media and other online data resources. As an initial step, I have extended synthetic population model for Washington DC metro area to include transient (tourists and business travelers) population. This is done by combining data from various online and offline data resources by hand. This subpopulation which keeps changing with time, has also shown to have an important effect on disease dynamics of the city. Next, I propose to use information from social media to improve activity patterns of individuals using hidden semi-Markov model.

  • Research Article
  • 10.58239/tamde.2025.01.006.x
Https://dergipark.org.tr/tr/pub/tamde/issue/92196/1674425
  • May 31, 2025
  • TAM Akademi Dergisi
  • Zahid Zufar At Thaariq + 3 more

This article discusses trends in the use of artificial intelligence (AI) in social sciences and natural sciences research. The introduction highlights how AI has evolved into an essential tool in both fields, addressing the limitations of traditional methods in social sciences and accelerating data analysis in natural sciences. The research method used is bibliometric analysis, with data collected from Google Scholar using keywords related to AI in social and natural sciences. Relevant articles were selected through a content evaluation and exclusion process, resulting in 1,000 social science publications and 999 natural science publications, which were further analyzed using VOSviewer with such as being outside the five-year range (published from 2020 to 2025). The study's findings indicate that in social sciences, AI is widely used to enhance research effectiveness through faster data processing, particularly in higher education and social policy analysis. Additionally, AI studies in social sciences are expanding, focusing on ethics, regulation, and human-AI interaction. In natural sciences, AI plays a crucial role in resource management, environmental research, and the healthcare industry, including disease diagnosis and drug development. Recent trends also show an increasing use of large language models (LLMs) and natural language processing (NLP) in scientific research. The study concludes that AI has become a key element in both social and natural science research. Recommendations for social science researchers include further exploration of AI’s impact on psychology, law, and education, as well as the use of bibliometric methods. Meanwhile, natural science researchers are advised to focus on improving AI transparency, developing more accurate technologies, and applying AI in environmental and industrial research. Interdisciplinary collaboration is necessary to ensure AI development remains ethical and inclusive.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.joms.2024.11.007
Evaluating Artificial Intelligence Chatbots in Oral and Maxillofacial Surgery Board Exams: Performance and Potential
  • Mar 1, 2025
  • Journal of Oral and Maxillofacial Surgery
  • Reema Mahmoud + 5 more

Evaluating Artificial Intelligence Chatbots in Oral and Maxillofacial Surgery Board Exams: Performance and Potential

  • Research Article
  • Cite Count Icon 118
  • 10.1097/corr.0000000000002704
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.
  • May 23, 2023
  • Clinical orthopaedics and related research
  • Zachary C Lum

Advances in neural networks, deep learning, and artificial intelligence (AI) have progressed recently. Previous deep learning AI has been structured around domain-specific areas that are trained on dataset-specific areas of interest that yield high accuracy and precision. A new AI model using large language models (LLM) and nonspecific domain areas, ChatGPT (OpenAI), has gained attention. Although AI has demonstrated proficiency in managing vast amounts of data, implementation of that knowledge remains a challenge. (1) What percentage of Orthopaedic In-Training Examination questions can a generative, pretrained transformer chatbot (ChatGPT) answer correctly? (2) How does that percentage compare with results achieved by orthopaedic residents of different levels, and if scoring lower than the 10th percentile relative to 5th-year residents is likely to correspond to a failing American Board of Orthopaedic Surgery score, is this LLM likely to pass the orthopaedic surgery written boards? (3) Does increasing question taxonomy affect the LLM's ability to select the correct answer choices? This study randomly selected 400 of 3840 publicly available questions based on the Orthopaedic In-Training Examination and compared the mean score with that of residents who took the test over a 5-year period. Questions with figures, diagrams, or charts were excluded, including five questions the LLM could not provide an answer for, resulting in 207 questions administered with raw score recorded. The LLM's answer results were compared with the Orthopaedic In-Training Examination ranking of orthopaedic surgery residents. Based on the findings of an earlier study, a pass-fail cutoff was set at the 10th percentile. Questions answered were then categorized based on the Buckwalter taxonomy of recall, which deals with increasingly complex levels of interpretation and application of knowledge; comparison was made of the LLM's performance across taxonomic levels and was analyzed using a chi-square test. ChatGPT selected the correct answer 47% (97 of 207) of the time, and 53% (110 of 207) of the time it answered incorrectly. Based on prior Orthopaedic In-Training Examination testing, the LLM scored in the 40th percentile for postgraduate year (PGY) 1s, the eighth percentile for PGY2s, and the first percentile for PGY3s, PGY4s, and PGY5s; based on the latter finding (and using a predefined cutoff of the 10th percentile of PGY5s as the threshold for a passing score), it seems unlikely that the LLM would pass the written board examination. The LLM's performance decreased as question taxonomy level increased (it answered 54% [54 of 101] of Tax 1 questions correctly, 51% [18 of 35] of Tax 2 questions correctly, and 34% [24 of 71] of Tax 3 questions correctly; p = 0.034). Although this general-domain LLM has a low likelihood of passing the orthopaedic surgery board examination, testing performance and knowledge are comparable to that of a first-year orthopaedic surgery resident. The LLM's ability to provide accurate answers declines with increasing question taxonomy and complexity, indicating a deficiency in implementing knowledge. Current AI appears to perform better at knowledge and interpretation-based inquires, and based on this study and other areas of opportunity, it may become an additional tool for orthopaedic learning and education.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon