Abstract

The last several years have witnessed an explosion of methods and applications for combining image data with 'omics data, and for prediction of clinical phenotypes. Much of this research has focused on cancer histology, for which genetic perturbations are large, and the signal to noise ratio is high. Related research on chronic, complex diseases is limited by tissue sample availability, lower genomic signal strength, and the less extreme and tissue-specific nature of intermediate histological phenotypes. Data from the GTEx Consortium provides a unique opportunity to investigate the connections among phenotypic histological variation, imaging data, and 'omics profiling, from multiple tissue-specific phenotypes at the sub-clinical level. Investigating histological designations in multiple tissues, we survey the evidence for genomic association and prediction of histology, and use the results to test the limits of prediction accuracy using machine learning methods applied to the imaging data, genomics data, and their combination. We find that expression data has similar or superior accuracy for pathology prediction as our use of imaging data, despite the fact that pathological determination is made from the images themselves. A variety of machine learning methods have similar performance, while network embedding methods offer at best limited improvements. These observations hold across a range of tissues and predictor types. The results are supportive of the use of genomic measurements for prediction, and in using the same target tissue in which pathological phenotyping has been performed. Although this last finding is sensible, to our knowledge our study is the first to demonstrate this fact empirically. Even while prediction accuracy remains a challenge, the results show clear evidence of pathway and tissue-specific biology.

Highlights

  • Histopathology refers to the microscopic examination of tissues in order to identify possible changes caused by disease, which is still largely conducted by human pathologists using expert judgment

  • In order to best represent prediction accuracy for relatively interpretable models, we used a combination of principal components and LASSO regression as initial analyses with crossvalidation, and area under the receiver-operator characteristic curve (AUC) as the performance criterion

  • Column 4 shows the performance as measured by the area under the receiver-operator characteristic (ROC) curve (AUC) from the regression model of pathology against the 10 image PCs

Read more

Summary

INTRODUCTION

Histopathology refers to the microscopic examination of tissues in order to identify possible changes caused by disease, which is still largely conducted by human pathologists using expert judgment. The expression QTL results from GTEx v8 (GTEx Consortium et al, 2020) provide incomplete support for this hypothesis, as a large proportion of significant eQTLs appear to be common across tissues, raising the possibility of analogous findings for histopathological designations. In other words, it is unclear whether expression should be measured in the same tissue as that providing the basis for diagnosis. We perform a comprehensive investigation of six pathological designations in five GTEx tissues, exploring the limits of machine-learning prediction accuracy using imaging data, expression, and their combination

Histopathological Data
Gene Expression Data
Integrative Analyses
RESULTS
Lung—Fibrosis
Liver—Steatosis and Congestion
Tibial Artery - Atherosclerosis
Thyroid—Hashimoto’s Disease
Adipose–Fibrosis
Overall
DISCUSSION
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.