Abstract

Adoption of Electronic Health Record (EHR) systems has led to collection of massive healthcare data, which creates oppor- tunities and challenges to study them. Computational phenotyping offers a promising way to convert the sparse and complex data into meaningful concepts that are interpretable to healthcare givers to make use of them. We propose a novel su- pervised nonnegative tensor factorization methodology that derives discriminative and distinct phenotypes. We represented co-occurrence of diagnoses and prescriptions in EHRs as a third-order tensor, and decomposed it using the CP algorithm. We evaluated discriminative power of our models with an Intensive Care Unit database (MIMIC-III) and demonstrated superior performance than state-of-the-art ICU mortality calculators (e.g., APACHE II, SAPS II). Example of the resulted phenotypes are sepsis with acute kidney injury, cardiac surgery, anemia, respiratory failure, heart failure, cardiac arrest, metastatic cancer (requiring ICU), end-stage dementia (requiring ICU and transitioned to comfort-care), intraabdominal conditions, and alcohol abuse/withdrawal.

Highlights

  • Phenotyping based on machine learning has been proposed to facilitate extraction of meaningful phenotypes automatically from Electronic Health Record (EHR) without human supervision through a process called computational phenotyping

  • One of important characteristics that phenotypes should have is to be discriminative to a certain clinical outcome of interest such as mortality, readmission, cost, et al So far, there has been little consideration about discriminative phenotypes associated with certain clinical outcomes

  • The discriminative phenotypes can be beneficial to clinicians because they can directly apply the phenotypes to their daily practice to improve the clinical outcome of interest

Read more

Summary

Introduction

Phenotyping based on machine learning has been proposed to facilitate extraction of meaningful phenotypes automatically from EHRs without human supervision through a process called computational phenotyping. A natural language processing technique extracts frequent terms from clinical narrative notes and defines phenotypes as a set of relevant and frequent terms[12,13,14] These frequent set mining methods are useful but unable to learn underlying latent characteristics. Dimensionality reduction phenotyping methods have been introduced to handle sparse and noisy data from EHRs’ large and heterogeneous features These methods represent phenotypes as latent medical concepts[18]. Nonnegative tensor factorization (NTF) is becoming popular due to its ability to capture high dimensional data It generates latent medical concepts using interaction between components from multiple information source[23,24,25,26,27]. We found cluster structures of diagnoses and prescriptions using contextual similarity between the components, and absorbed the cluster structure into the tensor decomposition process

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.