Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study.

Nansu Zong,Guoqian Jiang,Ming Huang,Yue Yu,Victoria Ngo,Yiqing Zhao,Sijia Liu,Chen Wang,Daniel J Stone,Andrew Wen

doi:10.2196/23586

Nansu Zong, Guoqian Jiang + Show 8 more

Open Access

https://doi.org/10.2196/23586

Copy DOI

Journal: JMIR medical informatics	Publication Date: May 25, 2021
Citations: 13	License type: cc-by

Affiliation: University of California, Davis

Abstract

BackgroundPrecision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions.ObjectiveThis study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries.MethodsWe extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic’s electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance.ResultsWith 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review.ConclusionsAccurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer.

Highlights

Cancer is the second leading cause of death worldwide [1]
We proposed a network-based framework (Figure 1) that represented cancer data using the Fast Healthcare Interoperability Resources (FHIR) standard and Resource Description Framework (RDF) to facilitate the cancer prediction process
We modeled cancer prediction as a multiple-label classification problem, where a given patient was represented with k-dimensional features, and a model categorized the patient into precisely 1 of 9 cancer types: colon cancer (ICD-9: 153.9), pancreas cancer (ICD-9: 157.9), ovary cancer (ICD-9: 183), prostate cancer (ICD-9: 185), connective and other soft tissue cancer (ICD-9: 171.9), thyroid gland cancer (ICD-9: 193), breast cancer (ICD-9: 174.9), liver cancer (ICD-9: 155), and bronchus and lung cancer (ICD-9: 162.9)

Summary

Introduction

Cancer is the second leading cause of death worldwide [1]. The health burden of cancer in the United States is substantial [2,3], with approximately 1.8 million new diagnoses and an estimated 600,000 deaths in 2020 alone [4]. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. Objective: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. Methods: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic’s electronic health records. Results: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

Identify Cancer Patients at Risk for Heart Failure using Electronic Health Record and Genetic Data
Zehao Yu ... Xi Yang
-
Zehao Yu, et. al.Zehao Yu ... Xi Yang
01 Jun 2022
01 Jun 2022

ACTION-EHR: Patient-Centric Blockchain-Based Electronic Health Record Data Management for Cancer Care.
Alevtina Dubovitskaya ... Nitesh Idnani
Journal of medical Internet research | VOL. 22
Alevtina Dubovitskaya, et. al.Alevtina Dubovitskaya ... Nitesh Idnani
21 Aug 2020
Journal of medical Internet research | VOL. 22

Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data.
Na Hong ... Feichen Shen
JAMIA Open | VOL. 2
Na Hong, et. al.Na Hong ... Feichen Shen
18 Oct 2019
JAMIA Open | VOL. 2

When Does Size Matter? Promises, Pitfalls, and Appropriate Interpretation of “Big” Medical Records Data
Kathryn Rough ... John T Thompson
Ophthalmology | VOL. 125
Kathryn Rough, et. al.Kathryn Rough ... John T Thompson
23 Jul 2018
Ophthalmology | VOL. 125

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics