Early detection of diseases using electronic health records data and covariance-regularized linear discriminant analysis

Jiang Bian,Laura E Barnes,Guanling Chen,Haoyi Xiong

doi:10.1109/bhi.2017.7897304

Abstract

The availability of Electronic Health Records (EHR) in health care settings provides terrific opportunities for early detection of patients' potential diseases. While many data mining tools have been adopted for EHR-based disease early detection, Linear Discriminant Analysis (LDA) is one of the most widely-used statistical prediction methods. To improve the performance of LDA for early detection of diseases, we proposed to leverage CRDA - Covariance-Regularized LDA classifiers on top of diagnosis-frequency vector data representation. Specifically, CRDA employs a sparse precision matrix estimator derived based on graphical lasso to boost the accuracy of LDA classifiers. Algorithm analysis demonstrates that the error bound of graphical lasso estimator can intuitively lower the misclassification rate of LDA models. We performed extensive evaluation of CRDA using a large-scale real-world EHR dataset - CHSN for predicting mental health disorders (e.g., depression and anxiety) in college students from 10 US universities. We compared CRDA with other regularized LDA and downstream classifiers. The result shows CRDA outperforms all baselines by achieving significantly higher accuracy and F1 scores.

Full Text