Feature Extraction for Heroin-Use Classification Using Imbalanced Random Forest Methods

Matthew Beattie,Charles Nicholson

doi:10.1080/10826084.2020.1843058

Abstract

Background and aims: The National Survey on Drug Use and Health (NSDUH) contains a large number of responses and many features. This study aims to identify features from within NSDUH that are important in classifying heroin use. Proper implementation of random forest (RF) techniques copes with the highly imbalanced nature of heroin usage among respondents to identify features that are prominent in classification models involving nonlinear combinations of predictive variables. To date, methods for the proper application of RF to imbalanced medical datasets have not been defined. Methods: Three different RF classification techniques are applied to the 2016 NSDUH. The techniques are compared using scoring criteria, including area under the precision recall curve (AUPRC), to identify the best model. Variable importance scores (VIS) are checked for stability across the three models and the VIS from the best model are used to highlight features and categories of features that most influence the classification of heroin users. Findings: The best performing method was RF with random oversampling (AUPRC = 0.5437). The category of features regarding other drug use was most important (average z-scored VIS = 1.66) followed by age-of-first-use features (0.32). The most important individual feature was cocaine usage (z-scored VIS = 11.05), followed by crack usage (6.51). The most important individual feature other than specific drug use flags was the use of marijuana under the age of 18 (3.11). This study demonstrates a method for the use of RF in feature extraction from imbalanced medical datasets with many predictors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feature Extraction for Heroin-Use Classification Using Imbalanced Random Forest Methods

Abstract

Talk to us

Similar Papers

More From: Substance Use & Misuse

Lead the way for us

Journal: Substance Use & Misuse	Publication Date: Nov 12, 2020
Citations: 3

Similar Papers

Alcohol, Illicit Drug Use Vary Widely by Region
Eve Bender
Psychiatric News | VOL. 43
Eve BenderEve Bender
15 Aug 2008
Psychiatric News | VOL. 43

The performance of VCS(volume, conductivity, light scatter) parameters in distinguishing latent tuberculosis and active tuberculosis by using machine learning algorithm
Lijiao Chen ... Shaoli Deng
BMC Infectious Diseases | VOL. 23
Lijiao Chen, et. al.Lijiao Chen ... Shaoli Deng
16 Dec 2023
BMC Infectious Diseases | VOL. 23

Perceived Effectiveness of Medications Among Mental Health Service Users With and Without Alcohol Dependence
M J Edlund ... K M Harris
Psychiatric Services | VOL. 57
M J Edlund, et. al.M J Edlund ... K M Harris
01 May 2006
Psychiatric Services | VOL. 57

Predicting Postoperative Mortality With Deep Neural Networks and Natural Language Processing: Model Development and Validation.
Pei-Fu Chen ... Kuan-Chih Chen
JMIR Medical Informatics | VOL. 10
Pei-Fu Chen, et. al.Pei-Fu Chen ... Kuan-Chih Chen
10 May 2022
JMIR Medical Informatics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature Extraction for Heroin-Use Classification Using Imbalanced Random Forest Methods

Abstract

Talk to us

Similar Papers

More From: Substance Use & Misuse