Abstract

The availability of Electronic Health Records (EHR) in health care settings provides terrific opportunities for early detection of patients' potential diseases. While many data mining tools have been adopted for EHR-based disease early detection, Linear Discriminant Analysis (LDA) is one of the most widely-used statistical prediction methods. To improve the performance of LDA for early detection of diseases, we proposed to leverage CRDA - Covariance-Regularized LDA classifiers on top of diagnosis-frequency vector data representation. Specifically, CRDA employs a sparse precision matrix estimator derived based on graphical lasso to boost the accuracy of LDA classifiers. Algorithm analysis demonstrates that the error bound of graphical lasso estimator can intuitively lower the misclassification rate of LDA models. We performed extensive evaluation of CRDA using a large-scale real-world EHR dataset - CHSN for predicting mental health disorders (e.g., depression and anxiety) in college students from 10 US universities. We compared CRDA with other regularized LDA and downstream classifiers. The result shows CRDA outperforms all baselines by achieving significantly higher accuracy and F1 scores.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call