Does Truth Pay? Investigating the Effectiveness of the Bayesian Truth Serum With an Interim Payment: A Registered Report

  • Abstract
  • Literature Map
  • References
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Self-report data are vital in psychological research, but biases such as careless responding and socially desirable responding can compromise their validity. Although various methods are employed to mitigate these biases, they have limitations. The Bayesian truth serum (BTS) offers a survey scoring method to incentivize truthfulness by leveraging correlations between personal and collective opinions and rewarding “surprisingly common” responses. In this study, we evaluated the effectiveness of the BTS in mitigating socially desirable responding to sensitive questions and tested whether an interim payment could enhance its efficacy by increasing trust. In a between-subjects experimental survey, 877 participants were randomly assigned to one of three conditions: BTS, BTS with interim payment, and regular incentive (RI). Contrary to the hypotheses, participants in the BTS conditions displayed lower agreement with socially undesirable statements compared with the RI condition. The interim payment did not significantly enhance the BTS’s effectiveness. Instead, response patterns diverged from the mechanism’s intended effects, raising concerns about its robustness. As the second registered report to challenge its efficacy, this study’s results cast serious doubt on the BTS as a reliable tool for mitigating socially desirable responding and improving the validity of self-report data in psychological research.

ReferencesShowing 10 of 40 papers
  • Cite Count Icon 138
  • 10.1509/jmkr.47.1.14
Reducing Social Desirability Bias through Item Randomized Response: An Application to Measure Underreported Desires
  • Feb 1, 2010
  • Journal of Marketing Research
  • Martijn G De Jong + 2 more

  • Cite Count Icon 468
  • 10.1126/science.1102081
A Bayesian truth serum for subjective data.
  • Oct 15, 2004
  • Science
  • DražEn Prelec

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 1
  • 10.1027/1618-3169/a000558
Taking a Closer Look at the Bayesian Truth Serum
  • Jul 1, 2022
  • Experimental Psychology
  • Philipp Schoenegger + 1 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 7
  • 10.3389/fpsyg.2019.02747
Cross-Cultural Examination of the False Consensus Effect
  • Dec 11, 2019
  • Frontiers in Psychology
  • Incheol Choi + 1 more

  • Cite Count Icon 99
  • 10.1037/cap0000236
Psychological measurement and the replication crisis: Four sacred cows.
  • Nov 1, 2020
  • Canadian Psychology / Psychologie canadienne
  • Scott O Lilienfeld + 1 more

  • Cite Count Icon 1721
  • 10.1177/0956797611430953
Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling
  • Apr 16, 2012
  • Psychological Science
  • Leslie K John + 2 more

  • Cite Count Icon 68
  • 10.1525/9780520411586-013
On Median Tests for Linear Hypotheses
  • Dec 31, 1951
  • G W Brown + 1 more

  • Open Access Icon
  • Cite Count Icon 2366
  • 10.1037/0033-2909.133.5.859
Sensitive questions in surveys.
  • Jan 1, 2007
  • Psychological Bulletin
  • Roger Tourangeau + 1 more

  • Open Access Icon
  • Cite Count Icon 113
  • 10.1111/ecoj.12160
Belief Elicitation: A Horse Race among Truth Serums
  • Sep 12, 2014
  • The Economic Journal
  • Stefan T Trautmann + 1 more

  • Cite Count Icon 375
  • 10.1016/0022-1031(85)90020-4
The false consensus effect: A meta-analysis of 115 hypothesis tests
  • May 1, 1985
  • Journal of Experimental Social Psychology
  • Brian Mullen + 6 more

Similar Papers
  • Research Article
  • Cite Count Icon 22
  • 10.1080/07370024.2013.870393
Exploring the Utility of Bayesian Truth Serum for Assessing Design Knowledge
  • Jun 17, 2014
  • Human–Computer Interaction
  • Scarlett R Miller + 2 more

Expanding and improving design knowledge is a vital part of higher education due to the growing demand for employees who can think both critically and creatively. However, developing effective methods for assessing what students have learned in design courses is one of the most elusive challenges of design education due to the subjective nature of design. For example, evaluating design outcomes is problematic due to the common pattern of increasing enrollments and reduced resources for design instruction. In this article, we propose and evaluate a new assessment method that uses a novel application of Bayesian Truth Serum (BTS), a scoring algorithm, in order to provide a scalable and reliable measure of design knowledge. This method requires no subjective input from the design instructor, nor does it require answers to questions that have distinct right or wrong answers. We tested this method over a 4-week period with 71 design students in an upper-level design course. For the study, participants were asked to provide responses to multiple-choice BTS survey questions, generate ideas for a design problem, and provide feedback on other participants' ideas. The survey data were used to calculate BTS indices of expertise and statistical tests were performed to determine how the indices correlated with participant ideation and critique proficiency. The results from this study show a modest correlation between the BTS indices of expertise and later performance on generative design tasks and a correlation between the students' ability to critique designs and their BTS scores. These findings suggest that the BTS assessment method can be used to supplement existing evaluation practices for individual design assessment, particularly in courses where group projects are used as the primary means of evaluation. In addition, the results show promise for using the BTS method in classes where design projects or design critiques are not feasible due to time constraints or large class sizes.

  • Conference Article
  • 10.1109/ieem.2016.7798045
Performance comparison of two truth telling incentive mechanisms: An experimental method
  • Dec 1, 2016
  • Min Yang + 2 more

An experiment of predicting the result of the National Basketball Association (NBA) is designed to compare the performance of Bayesian truth serum (BTS) and robust Bayesian truth serum (RBTS) in expert truth telling incentive. It shows that 1) the fundamental hypothesis of BTS, i.e. individuals treating personal opinions as an “impersonally informative” signal about the population distribution, is valid; 2) the relative position of the individual in a group significantly affects the individual's score of RBTS due to RBTS algorithm, which leads to a poor incentive to tell truth, while BTS does not have this problem; 3) BTS does not encourage extreme opinions, while RBTS cannot effectively reward the extreme opinions due to significant deviation of their RBTS scores. The analytic results prefer to BTS, but when the number of individuals is small, RBTS should be employed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1027/1618-3169/a000558
Taking a Closer Look at the Bayesian Truth Serum
  • Jul 1, 2022
  • Experimental Psychology
  • Philipp Schoenegger + 1 more

Over the past few decades, psychology and its cognate disciplines have undergone substantial scientific reform, ranging from advances in statistical methodology to significant changes in academic norms. One aspect of experimental design that has received comparatively little attention is incentivization, i.e., the way that participants are rewarded and incentivized monetarily for their participation in experiments and surveys. While incentive-compatible designs are the norm in disciplines like economics, the majority of studies in psychology and experimental philosophy are constructed such that individuals' incentives to maximize their payoffs in many cases stand opposed to their incentives to state their true preferences honestly. This is in part because the subject matter is often self-report data about subjective topics, and the sample is drawn from online platforms like Prolific or MTurk where many participants are out to make a quick buck. One mechanism that allows for the introduction of an incentive-compatible design in such circumstances is the Bayesian Truth Serum (BTS; Prelec, 2004), which rewards participants based on how surprisingly common their answers are. Recently, Schoenegger (2021) applied this mechanism in the context of Likert-scale self-reports, finding that the introduction of this mechanism significantly altered response behavior. In this registered report, we further investigate this mechanism by (1) attempting to directly replicate the previous result and (2) analyzing if the Bayesian Truth Serum's effect is distinct from the effects of its constituent parts (increase in expected earnings and addition of prediction tasks). We fail to find significant differences in response behavior between participants who were simply paid for completing the study and participants who were incentivized with the BTS. Per our pre-registration, we regard this as evidence in favor of a null effect of up to V = .1 and a failure to replicate but reserve judgment as to whether the BTS mechanism should be adopted in social science fields that rely heavily on Likert-scale items reporting subjective data, seeing that smaller effect sizes might still be of practical interest and results may differ for items different from the ones we studied. Further, we provide weak evidence that the prediction task itself influences response distributions and that this task's effect is distinct from an increase in expected earnings, suggesting a complex interaction between the BTS' constituent parts and its truth-telling instructions.

  • Research Article
  • Cite Count Icon 55
  • 10.1509/jmr.09.0039
Creating Truth-Telling Incentives with the Bayesian Truth Serum
  • Jun 1, 2013
  • Journal of Marketing Research
  • Ray Weaver + 1 more

The Bayesian truth serum (BTS) is a survey scoring method that creates truth-telling incentives for respondents answering multiple-choice questions about intrinsically private matters, such as opinions, tastes, and behavior. The authors test BTS in several studies, primarily using recognition questionnaires that present items such as brand names and scientific terms. One-third of the items were nonexistent foils. The BTS mechanism, which mathematically rewards “surprisingly common” answers, both rewarded truth telling, by heavily penalizing foil recognition, and induced truth telling, in that participants who were paid according to their BTS scores claimed to recognize fewer foils than control groups, even when given competing incentives to exaggerate. Survey takers who received BTS-based payments without explanation became less likely to recognize foils as they progressed through the survey, suggesting that they learned to respond to BTS incentives despite the absence of guidance. The mechanism also outperformed the solemn oath, a competing truth-inducement mechanism. Finally, when applied to judgments about contributing to a public good, BTS eliminated the bias common in contingent valuation elicitations.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 18
  • 10.1371/journal.pone.0177385
Validating Bayesian truth serum in large-scale online human experiments.
  • May 11, 2017
  • PLOS ONE
  • Morgan R Frank + 3 more

Bayesian truth serum (BTS) is an exciting new method for improving honesty and information quality in multiple-choice survey, but, despite the method’s mathematical reliance on large sample sizes, existing literature about BTS only focuses on small experiments. Combined with the prevalence of online survey platforms, such as Amazon’s Mechanical Turk, which facilitate surveys with hundreds or thousands of participants, BTS must be effective in large-scale experiments for BTS to become a readily accepted tool in real-world applications. We demonstrate that BTS quantifiably improves honesty in large-scale online surveys where the “honest” distribution of answers is known in expectation on aggregate. Furthermore, we explore a marketing application where “honest” answers cannot be known, but find that BTS treatment impacts the resulting distributions of answers.

  • Research Article
  • Cite Count Icon 73
  • 10.1609/aaai.v26i1.8261
A Robust Bayesian Truth Serum for Small Populations
  • Sep 20, 2021
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Jens Witkowski + 1 more

Peer prediction mechanisms allow the truthful elicitation of private signals (e.g., experiences, or opinions) in regard to a true world state when this ground truth is unobservable. The original peer prediction method is incentive compatible for any number of agents n >= 2, but relies on a common prior, shared by all agents and the mechanism. The Bayesian Truth Serum (BTS) relaxes this assumption. While BTS still assumes that agents share a common prior, this prior need not be known to the mechanism. However, BTS is only incentive compatible for a large enough number of agents, and the particular number of agents required is uncertain because it depends on this private prior. In this paper, we present a robust BTS for the elicitation of binary information which is incentive compatible for every n >= 3, taking advantage of a particularity of the quadratic scoring rule. The robust BTS is the first peer prediction mechanism to provide strict incentive compatibility for every n >= 3 without relying on knowledge of the common prior. Moreover, and in contrast to the original BTS, our mechanism is numerically robust and ex post individually rational.

  • Research Article
  • Cite Count Icon 36
  • 10.1007/s10940-014-9219-4
Incentivizing Responses to Self-report Questions in Perceptual Deterrence Studies: An Investigation of the Validity of Deterrence Theory Using Bayesian Truth Serum
  • Mar 12, 2014
  • Journal of Quantitative Criminology
  • Thomas A Loughran + 2 more

Objective Criminological researchers want people to reveal considerable private information when utilizing self-report surveys, such as involvement in crime, subjective attitudes and expectations, and probability judgments. Some of this private information is easily accessible for subjects and all that is required is for individuals to be honest, while other information requires mental effort and cognitive reflection. Though researchers generally provide little or no incentive to be honest and thoughtful, it is generally assumed that subjects do provide honest and accurate information. We assess the accuracy of deterrence measures by employing a scoring rule known as the Bayesian truth serum (BTS)—that incentivizes honesty and thoughtfulness among respondents.

  • Research Article
  • Cite Count Icon 28
  • 10.1016/j.jebo.2020.04.020
Unraveling hypothetical bias in discrete choice experiments
  • Jun 13, 2020
  • Journal of Economic Behavior & Organization
  • Luisa Menapace + 1 more

Unraveling hypothetical bias in discrete choice experiments

  • Book Chapter
  • Cite Count Icon 1
  • 10.4324/9781315517971-9
Research using archival data
  • Aug 14, 2018
  • Gwenith G Fisher + 1 more

This chapter discusses the use of archival data in psychological research. Archival data are existing data that were collected for a purpose other than its current use. First, we review several sources and types of archival data that may be of particular interest to applied psychology researchers, including social science data archives, public documents, datasets, official records, private documents or records, and mass media. Next we discuss advantages and disadvantages that are unique to archival data use in psychological research, such as the opportunity to access datasets with a specific methodology, design or specific population, along with the inherent challenges associated with navigating large, complex datasets with limited control over whether and how variables of interest are measured. Finally, we conclude with a series of recommendations to anyone who may be considering the use of archival data for a research project. We hope this chapter serves as a useful introduction to archival data and helpful resource to those seeking to use archival data in psychological research.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/tit.2018.2867469
Game of Duels: Information-Theoretic Axiomatization of Scoring Rules
  • Jan 1, 2019
  • IEEE Transactions on Information Theory
  • Jaksa Cvitanic + 3 more

This paper aims to develop the insights into Bayesian truth serum (BTS) mechanism by postulating a sequence of seven natural conditions reminiscent of axioms in information theory. The condition that reduces a larger family of mechanisms to BTS is additivity, akin to the axiomatic development of entropy. The seven conditions identify BTS as the unique scoring rule for ranking respondents in situations in which respondents are asked to choose an alternative from a finite set and provide predictions of their peers’ propensities to choose, for finite or infinite sets of respondents.

  • Research Article
  • Cite Count Icon 8
  • 10.1111/1477-9552.12278
The Benefits of Farm Animal Welfare Legislation: The Case of the EU Broiler Directive and Truthful Reporting
  • May 25, 2018
  • Journal of Agricultural Economics
  • Richard Bennett + 3 more

The EU Broiler Directive came into force in the UK in June 2010 with the aim of setting new minimum standards, monitoring broiler welfare and addressing any welfare problems. A survey questionnaire was used to elicit information from a stratified sample of citizens in England and Wales regarding their willingness to pay for the provisions of the Directive, as an estimate of the consumer surplus associated with the legislation. We also explore the usefulness of Prelec's () Bayesian Truth Serum (BTS) in promoting respondents’ truthful reporting. A median willingness to pay of £21.50 per household per year (corrected for sample bias and possible ‘yea saying’) was estimated from 665 responses. This provides an estimated benefit of the legislation to citizens of over £503 million per year, equivalent to 5.3% of current consumer expenditure on chicken. This compares to an estimated £22 million per year cost of producers’ compliance and government enforcement associated with the legislation. No statistically significant differences in responses between respondents that did and did not have a BTS incentive to answer questions truthfully were found, which might reflect apparently truthful answers in this case, an insufficiently strong financial incentive or a weakened effect due to an element of disbelief in the BTS amongst the sample. The analysis suggests that the benefits of the Broiler Directive to citizens greatly outweigh the additional costs to producers, making a case for the legislation to be retained.

  • Research Article
  • Cite Count Icon 25
  • 10.2200/s00788ed1v01y201707aim035
Game Theory for Data Science: Eliciting Truthful Information
  • Sep 19, 2017
  • Synthesis Lectures on Artificial Intelligence and Machine Learning
  • Boi Faltings + 1 more

Intelligent systems often depend on data provided by information agents, for example, sensor data or crowdsourced human computation. Providing accurate and relevant data requires costly effort that agents may not always be willing to provide. Thus, it becomes important not only to verify the correctness of data, but also to provide incentives so that agents that provide high-quality data are rewarded while those that do not are discouraged by low rewards. We cover different settings and the assumptions they admit, including sensing, human computation, peer grading, reviews, and predictions. We survey different incentive mechanisms, including proper scoring rules, prediction markets and peer prediction, Bayesian Truth Serum, Peer Truth Serum, Correlated Agreement, and the settings where each of them would be suitable. As an alternative, we also consider reputation mechanisms. We complement the game-theoretic analysis with practical examples of applications in prediction platforms, community sensing, and peer grading.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.3389/fpsyg.2021.621547
The Use of Questionable Research Practices to Survive in Academia Examined With Expert Elicitation, Prior-Data Conflicts, Bayes Factors for Replication Effects, and the Bayes Truth Serum.
  • Nov 29, 2021
  • Frontiers in Psychology
  • Rens Van De Schoot + 7 more

The popularity and use of Bayesian methods have increased across many research domains. The current article demonstrates how some less familiar Bayesian methods can be used. Specifically, we applied expert elicitation, testing for prior-data conflicts, the Bayesian Truth Serum, and testing for replication effects via Bayes Factors in a series of four studies investigating the use of questionable research practices (QRPs). Scientifically fraudulent or unethical research practices have caused quite a stir in academia and beyond. Improving science starts with educating Ph.D. candidates: the scholars of tomorrow. In four studies concerning 765 Ph.D. candidates, we investigate whether Ph.D. candidates can differentiate between ethical and unethical or even fraudulent research practices. We probed the Ph.D.s’ willingness to publish research from such practices and tested whether this is influenced by (un)ethical behavior pressure from supervisors or peers. Furthermore, 36 academic leaders (deans, vice-deans, and heads of research) were interviewed and asked to predict what Ph.D.s would answer for different vignettes. Our study shows, and replicates, that some Ph.D. candidates are willing to publish results deriving from even blatant fraudulent behavior–data fabrication. Additionally, some academic leaders underestimated this behavior, which is alarming. Academic leaders have to keep in mind that Ph.D. candidates can be under more pressure than they realize and might be susceptible to using QRPs. As an inspiring example and to encourage others to make their Bayesian work reproducible, we published data, annotated scripts, and detailed output on the Open Science Framework (OSF).

  • Research Article
  • 10.3390/app15063003
A New Ensemble Strategy Based on Surprisingly Popular Algorithm and Classifier Prediction Confidence
  • Mar 10, 2025
  • Applied Sciences
  • Haochen Shi + 4 more

Traditional ensemble methods rely on majority voting, which may fail to recognize correct answers held by a minority in scenarios requiring specialized knowledge. Therefore, this paper proposes two novel ensemble methods for supervised classification, named Confidence Truth Serum (CTS) and Confidence Truth Serum with Single Regression (CTS-SR). The former is based on the principles of Bayesian Truth Serum (BTS) and introduces classification confidence to calculate the prior and posterior probabilities of events, enabling the recovery of correct judgments provided by a confident minority beyond majority voting. CTS-SR further simplifies the algorithm by constructing a single regression model to reduce computational overhead, making it suitable for large-scale applications. Experiments are conducted on multiple binary classification datasets to evaluate CTS and CTS-SR. Experimental results demonstrate that, compared with existing ensemble methods, both of the proposed methods significantly outperform baseline algorithms in terms of accuracy and F1 scores. Specifically, there is an average improvement of 2–6% in accuracy and an average increase of 2–4% in F1 score. Notably, on the Musk and Hilly datasets, our method achieves a 5% improvement compared to the traditional majority voting approach. Particularly on the Hilly dataset, which generally exhibits the poorest classification performance and poses the greatest prediction challenges, our method demonstrates the best discriminative performance. validating the importance of confidence as a feature in ensemble learning.

  • Research Article
  • Cite Count Icon 15
  • 10.1108/jedt-05-2015-0033
Lifecycle cost risk analysis for infrastructure projects with modified Bayesian networks
  • Feb 6, 2017
  • Journal of Engineering, Design and Technology
  • Nini Xia + 4 more

PurposePrevious research has little specific guidance on how to improve large infrastructures’ risk analysis. This paper aims to propose a practical risk analysis framework across the project lifecycle with Bayesian Networks (BNs).Design/methodology/approachThe framework includes three phases. In the qualitative phase, primary risks were identified by literature reviews and interviews; questionnaires were used to determine key risks at each project stage and causal relationships between stage-related risks. In the quantitation, brainstorming and questionnaires, and techniques of ranked nodes/paths, risk map and Bayesian truth serum were adopted. Then, a BN-based risk assessment model was developed, and risk analysis was conducted with AgenaRisk software.FindingsTwenty key risks across the lifecycle were determined: some risks were recurring and different risks emerged at various stages with the construction and feasibility most risky. Results showed that previous stages’ risks significantly amplified subsequent stages’ risks. Based on the causality of stage-related risks, a qualitative model was easily constructed. Ranked nodes/paths facilitated the quantification by requiring less statistical knowledge and fewer parameters than traditional BNs. As articulated by a case, this model yielded very simple and easy-to-understand representations of risks and risk propagation pathways.Originality/valueRare research has developed a BN risk assessment model from the perspective of project stages. A structured model, a propagation network among individual risks, stage-related risks, and the final adverse consequence, has been designed. This research provides practitioners with a realistic risk assessment approach and further understanding of dynamic and stage-related risks throughout large infrastructures’ lifecycle. The framework can be modified and used in other real-world risk analysis where risks are complex and develop in stages.

More from: Advances in Methods and Practices in Psychological Science
  • Research Article
  • 10.1177/25152459251379432
Do Musicians Have Better Short-Term Memory Than Nonmusicians? A Multilab Study
  • Oct 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Massimo Grassi + 99 more

  • Research Article
  • 10.1177/25152459251380452
A Tutorial on Distribution-Free Uncertainty Quantification Using Conformal Prediction
  • Oct 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Tim Kaiser + 1 more

  • Research Article
  • 10.1177/25152459251375445
Consistent and Precise Description of Research Outputs Could Improve Implementation of Open Science
  • Oct 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Evan Mayo-Wilson + 3 more

  • Research Article
  • 10.1177/25152459251351287
Citing Decisions in Psychology: A Roadblock to Cumulative and Inclusive Science
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Katherine M Lawson + 4 more

  • Research Article
  • 10.1177/25152459251360642
A Fragmented Field: Construct and Measure Proliferation in Psychology
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Farid Anvari + 6 more

  • Research Article
  • 10.1177/25152459251343043
Does Truth Pay? Investigating the Effectiveness of the Bayesian Truth Serum With an Interim Payment: A Registered Report
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Claire M Neville + 1 more

  • Research Article
  • 10.1177/25152459251361013
The DECIDE Framework: Describing Ethical Choices in Digital-Behavioral-Data Explorations
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Heather Shaw + 5 more

  • Research Article
  • 10.1177/25152459251343582
Large Language Models for Psychological Assessment: A Comprehensive Overview
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Jocelyn Brickman + 2 more

  • Research Article
  • 10.1177/25152459251355585
On Partial Versus Full Mediation and the Importance of Effect Sizes
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Thomas Ledermann + 2 more

  • Research Article
  • 10.1177/25152459251348431
Bestiary of Questionable Research Practices in Psychology
  • Jul 1, 2025
  • Advances in Methods and Practices in Psychological Science
  • Tamás Nagy + 18 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon