Machine learning prediction of blood alcohol concentration: a digital signature of smart-breathalyzer behavior

Kirstin Aschbacher,Christian S Hendershot,Gregory M Marcus,Judith A Hahn,Geoffrey Tison,Jeffrey E Olgin,Robert Avram

doi:10.1038/s41746-021-00441-4

Abstract

Excess alcohol use is an important determinant of death and disability. Machine learning (ML)-driven interventions leveraging smart-breathalyzer data may help reduce these harms. We developed a digital phenotype of long-term smart-breathalyzer behavior to predict individuals’ breath alcohol concentration (BrAC) levels trained on data from a smart breathalyzer. We analyzed roughly one million datapoints from 33,452 users of a commercial smart-breathalyzer device, collected between 2013 and 2017. For validation, we analyzed the associations between state-level observed smart-breathalyzer BrAC levels and impaired-driving motor vehicle death rates. Behavioral, geolocation-based, and time-series-derived features were fed to an ML algorithm using training (70% of the cohort), development (10% of the cohort), and test (20% of the cohort) sets to predict the likelihood of a BrAC exceeding the legal driving limit (0.08 g/dL). States with higher average BrAC levels had significantly higher alcohol-related driving death rates, adjusted for the number of users per state B (SE) = 91.38 (15.16), p < 0.01. In the independent test set, the ML algorithm predicted the likelihood of a given user-initiated BrAC sample exceeding BrAC ≥ 0.08 g/dL, with an area under the curve (AUC) of 85%. Highly predictive features included users’ prior BrAC trends, subjective estimation of their BrAC (or AUC = 82% without the self-estimate), engagement and self-monitoring, time since the last measure, and hour of the day. In conclusion, an ML algorithm successfully quantified a digital phenotype of behavior, predicting naturalistic BrAC levels exceeding 0.08 g/dL (a threshold associated with alcohol-related harm) with good discrimination capability. This result establishes a foundation for future research on precision behavioral medicine digital health interventions using smart breathalyzers and passive monitoring approaches.

Highlights

According to the World Health Organization, harmful use of alcohol accounts for 5% of the global disease burden, or 1 in 20 deaths1
To enable real-time intervention, an machine learning (ML) model would need to be capable of predicting future blood alcohol concentrations (BACs) risk thresholds with reasonably high sensitivity and specificity based on minimal information
We sought to investigate whether breath alcohol concentration (BrAC) levels associated with alcohol-related harms (BrAC ≥ 0.08 g/dL)5 can be predicted with reasonable accuracy in a large, international sample of smart-breathalyzer users, given behavioral, geolocation, and temporal data related to device and app usage

Summary

INTRODUCTION

According to the World Health Organization, harmful use of alcohol accounts for 5% of the global disease burden, or 1 in 20 deaths. Naturalistic data from personal breathalyzers are, by definition, obtained during user-initiated drinking episodes Despite this limitation, these data might inform the development of ML-based interventions targeting harm-reduction approaches (e.g., predicting those drinking episodes that are more likely to result in higher BrAC). In regression analysis among 53,674 BrAC observations from 2641 naturalistic patterns of commercial smart-breathalyzer use and distinct users, we observed a significant association between their association with population-based health outcomes, such as higher average BrAC levels within our cohort and higher motor intoxicated driving-related mortality rates. To our knowledge, this is the first and only study of naturalistic, population-based BrAC data recorded in real time during drinking events

RESULTS

Majority class

DISCUSSION

Limitations

CODE AVAILABILITY