Abstract

BackgroundDifferential isoform usage is an important driver of inter-individual phenotypic diversity and is linked to various diseases and traits. However, accurately detecting the differential usage of different gene transcripts between groups can be difficult, in particular in less well annotated genomes where the spectrum of transcript isoforms is largely unknown.ResultsWe investigated whether machine learning approaches can detect differential isoform usage based purely on the distribution of reads across a gene region. We illustrate that gradient boosting and elastic net approaches can successfully identify large numbers of genes showing potential differential isoform usage between Europeans and Africans, that are enriched among relevant biological pathways and significantly overlap those identified by previous approaches. We demonstrate that diversity at the 3′ and 5′ ends of genes are primary drivers of these differences between populations.ConclusionMachine learning methods can effectively detect differential isoform usage from read fraction data, and can provide novel insights into the biological differences between groups.

Highlights

  • The number of unique mRNA isoforms encoded by the human genome is estimated to be 3–10 times higher than the number of genes [1, 2]

  • Population RNA‐seq datasets To investigate the ability of machine learning (ML) approaches to detect differential isoform usage from windowed read proportions we used the GEUVADIS RNA sequencing (RNA-seq) dataset generated by Lappalainen et al [13]

  • In this study we aimed to investigate whether ML approaches may have utility in detecting differential isoform usage based purely on the distribution of reads across a gene region, even when the precise isoforms are unknown

Read more

Summary

Introduction

The number of unique mRNA isoforms encoded by the human genome is estimated to be 3–10 times higher than the number of genes [1, 2]. This transcript diversity enables increased downstream phenotypic complexity through the expansion of the set of proteins encoded by a comparatively small set of genes [3, 4]. Conclusion: Machine learning methods can effectively detect differential isoform usage from read fraction data, and can provide novel insights into the biological differences between groups

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call