Statistical Learning in Medical Research with Decision Threshold and Accuracy Evaluation

Sumaiya Z. Sande,Ralph D’Agostino,Loraine Seng,Jialiang Li

doi:10.6339/21-jds1022

Abstract

Machine learning methods are increasingly applied for medical data analysis to reduce human efforts and improve our understanding of disease propagation. When the data is complicated and unstructured, shallow learning methods may not be suitable or feasible. Deep learning neural networks like multilayer perceptron (MLP) and convolutional neural network (CNN), have been incorporated in medical diagnosis and prognosis for better health care practice. For a binary outcome, these learning methods directly output predicted probabilities for patient’s health condition. Investigators still need to consider appropriate decision threshold to split the predicted probabilities into positive and negative regions. We review methods to select the cut-off values, including the relatively automatic methods based on optimization of the ROC curve criteria and also the utility-based methods with a net benefit curve. In particular, decision curve analysis (DCA) is now acknowledged in medical studies as a good complement to the ROC analysis for the purpose of decision making. In this paper, we provide the R code to illustrate how to perform the statistical learning methods, select decision threshold to yield the binary prediction and evaluate the accuracy of the resulting classification. This article will help medical decision makers to understand different classification methods and use them in real world scenario.

Highlights

Data science has expanded quickly due to the increase in data storage capacities and exploration of computational technologies and algorithms
We review methods to select the cut-off values, including the relatively automatic methods based on optimization of the Receiver Operating Characteristic (ROC) curve criteria and the utility-based methods with a net benefit curve
While the Pima Indian diabetes data allow shallow learning, we focus on a case study with deep learning

Summary

Introduction

Data science has expanded quickly due to the increase in data storage capacities and exploration of computational technologies and algorithms. The data mining techniques help to obtain the significant information from the patient health data and make promising predictions. When data are in the standard format, e.g., accessible via an Excel sheet, most shallow learning tools can be readily applied, including the familiar logistic regression, and classification trees for example. These methods are traditionally covered in the course curriculum in most graduate programs for statistics and biostatistics. On the other hand, when the data become complicated, we may Received May 10, 2021; Accepted August 18, 2021

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data Science	Publication Date: Jan 1, 2021
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Statistical Learning in Medical Research with Decision Threshold and Accuracy Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Science

Lead the way for us

Similar Papers

Comparisons of Convolutional Neural Network and Other Machine Learning Methods in Landslide Susceptibility Assessment: A Case Study in Pingwu
Ziyu Jiang ... Kai Liu
Remote Sensing | VOL. 15
Ziyu Jiang, et. al.Ziyu Jiang ... Kai Liu
31 Jan 2023
Remote Sensing | VOL. 15

Kidney X-ray Images Classification using Machine Learning and Deep Learning Methods
Işıl Aksakalli ... Sibel Kaçdioğlu
Balkan Journal of Electrical and Computer Engineering | VOL. 9
Işıl Aksakalli, et. al.Işıl Aksakalli ... Sibel Kaçdioğlu
30 Apr 2021
Balkan Journal of Electrical and Computer Engineering | VOL. 9

Landslide susceptibility mapping using multiscale sampling strategy and convolutional neural network: A case study in Jiuzhaigou region
Yaning Yi ... Jianqiang Zhang
CATENA | VOL. 195
Yaning Yi, et. al.Yaning Yi ... Jianqiang Zhang
18 Aug 2020
CATENA | VOL. 195

Pseudo-labeling of transfer learning convolutional neural network data for human facial emotion recognition
Olena О Arsirii ... Denys V Petrosiuk
Herald of Advanced Information Technology | VOL. 6
Olena О Arsirii, et. al.Olena О Arsirii ... Denys V Petrosiuk
12 Oct 2023
Herald of Advanced Information Technology | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical Learning in Medical Research with Decision Threshold and Accuracy Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Data Science