Abstract
Abstract. The predictive accuracy of personality-criterion regression models may be improved with statistical learning (SL) techniques. This study introduced a novel SL technique, BISCUIT (Best Items Scale that is Cross-validated, Unit-weighted, Informative, and Transparent). The predictive accuracy and parsimony of BISCUIT were compared with three established SL techniques (the lasso, elastic net, and random forest) and regression using two sets of scales, for five criteria, across five levels of data missingness. BISCUIT’s predictive accuracy was competitive with other SL techniques at higher levels of data missingness. BISCUIT most frequently produced the most parsimonious SL model. In terms of predictive accuracy, the elastic net and lasso dominated other techniques in the complete data condition and in conditions with up to 50% data missingness. Regression using 27 narrow traits was an intermediate choice for predictive accuracy. For most criteria and levels of data missingness, regression using the Big Five had the worst predictive accuracy. Overall, loss in predictive accuracy due to data missingness was modest, even at 90% data missingness. Findings suggest that personality researchers should consider incorporating planned data missingness and SL techniques into their designs and analyses.
Highlights
MethodsParticipant data were collected at https://sapa-project.org, an international online personality assessment
Models generated by the lasso were, on average, 99.8% as predictive as the elastic net models, which indicated that the predictive accuracies of the elastic net and lasso were functionally equivalent
Results from this study indicate that statistical learning techniques could prove to be essential in future research of personality-criterion relationships
Summary
Participant data were collected at https://sapa-project.org, an international online personality assessment. The SAPA (Synthetic Aperture Personality Assessment) Project is an ongoing research project where each participant is given a small random sample of a large item pool (over 6,000 items), resulting in an MMCAR data structure. Requiring complete data reduced the sample to 78,828 participants. Participants were from 200 countries (57% from the US), 65% were female, and the median age was 33 years (min = 14, max = 90). Descriptive information concerning the initial and final samples are available in Table 1 in ESM 1
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.