Optimal sampling for positive only electronic health record data.

Seong-H Lee,Yanyuan Ma,Jinbo Chen,Ying Wei

doi:10.1111/biom.13824

Abstract

Identifying a patient's disease/health status from electronic medical records is a frequently encountered task in electronic health records (EHR) related research, and estimation of a classification model often requires a benchmark training data with patients' known phenotype statuses. However, assessing a patient's phenotype is costly and labor intensive, hence a proper selection of EHR records as a training set is desired. We propose a procedure to tailor the best training subsample with limited sample size for a classification model, minimizing its mean-squared phenotyping/classification error (MSE). Our approach incorporates "positive only" information, an approximation of the true disease status without false alarm, when it is available. In addition, our sampling procedure is applicable for training a chosen classification model which can be misspecified. We provide theoretical justification on its optimality in terms of MSE. The performance gain from our method is illustrated through simulation and a real-data example, and is found often satisfactory under criteria beyondMSE.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimal sampling for positive only electronic health record data.

Abstract

Talk to us

Similar Papers

More From: Biometrics

Lead the way for us

Journal: Biometrics	Publication Date: Jan 11, 2023
License type: CC BY-NC-ND 4.0

Similar Papers

Comparison of Electronic Laboratory Reports, Administrative Claims, and Electronic Health Record Data for Acute Viral Hepatitis Surveillance
Joshua Allen-Dicker ... Michael Klompas
Journal of Public Health Management and Practice | VOL. 18
Joshua Allen-Dicker, et. al.Joshua Allen-Dicker ... Michael Klompas
01 May 2012
Journal of Public Health Management and Practice | VOL. 18

HIR Collaborating with the CODATA Conference
Hyejung Chang ... William T F Goossen
Healthcare Informatics Research | VOL. 19
Hyejung Chang, et. al.Hyejung Chang ... William T F Goossen
01 Jan 2013
Healthcare Informatics Research | VOL. 19

Continuity and Completeness of Electronic Health Record Data for Patients Treated With Oral Hypoglycemic Agents: Findings From Healthcare Delivery Systems in Taiwan.
Chien-Ning Hsu ... Kelly Huang
Frontiers in Pharmacology | VOL. 13
Chien-Ning Hsu, et. al.Chien-Ning Hsu ... Kelly Huang
04 Apr 2022
Frontiers in Pharmacology | VOL. 13

Harnessing the Data Universe to Understand and Reduce Clinical Deterioration in Children.
Anne Fallon ... Tina Sosa
Hospital pediatrics | VOL. 12
Anne Fallon, et. al.Anne Fallon ... Tina Sosa
26 Apr 2022
Hospital pediatrics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal sampling for positive only electronic health record data.

Abstract

Talk to us

Similar Papers

More From: Biometrics