Abstract

The lack of well-structured metadata annotations complicates the reusability and interpretation of the growing amount of publicly available RNA expression data. The machine learning-based prediction of metadata (data augmentation) can considerably improve the quality of expression data annotation. In this study, we systematically benchmark deep learning (DL) and random forest (RF)-based metadata augmentation of tissue, age, and sex using small RNA (sRNA) expression profiles. We use 4243 annotated sRNA-Seq samples from the sRNA expression atlas database to train and test the augmentation performance. In general, the DL machine learner outperforms the RF method in almost all tested cases. The average cross-validated prediction accuracy of the DL algorithm for tissues is 96.5%, for sex is 77%, and for age is 77.2%. The average tissue prediction accuracy for a completely new data set is 83.1% (DL) and 80.8% (RF). To understand which sRNAs influence DL predictions, we employ backpropagation-based feature importance scores using the DeepLIFT method, which enable us to obtain information on biological relevance of sRNAs.

Highlights

  • IntroductionData annotations (tissue, age, sex, etc.) are crucial for the reuse of data

  • Data annotations are crucial for the reuse of data

  • We present that deep learning (DL) algorithms outperform random forest (RF)-based data augmentation for tissue, sex, and age annotations using small RNA (sRNA) expression profiles, if enough training data are available

Read more

Summary

Introduction

Data annotations (tissue, age, sex, etc.) are crucial for the reuse of data. A detailed description of the biological conditions in which data have been obtained is required to extract new information from the obtained data. Metadata are often not stored together with the expression data and the available metadata are often not normalized, and are unstructured and incomplete. The widely used GEO repository (Gene Expression Omnibus [GEO]; https:// www.ncbi.nlm.nih.gov/geo), for example, stores annotations as a number of free text description fields. This leads to missing and/or inaccurate annotations and requires revisions and manual corrections by an expert (Hadley et al, 2017)

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.