Abstract
BackgroundDifferential isoform usage is an important driver of inter-individual phenotypic diversity and is linked to various diseases and traits. However, accurately detecting the differential usage of different gene transcripts between groups can be difficult, in particular in less well annotated genomes where the spectrum of transcript isoforms is largely unknown.ResultsWe investigated whether machine learning approaches can detect differential isoform usage based purely on the distribution of reads across a gene region. We illustrate that gradient boosting and elastic net approaches can successfully identify large numbers of genes showing potential differential isoform usage between Europeans and Africans, that are enriched among relevant biological pathways and significantly overlap those identified by previous approaches. We demonstrate that diversity at the 3′ and 5′ ends of genes are primary drivers of these differences between populations.ConclusionMachine learning methods can effectively detect differential isoform usage from read fraction data, and can provide novel insights into the biological differences between groups.
Highlights
The number of unique mRNA isoforms encoded by the human genome is estimated to be 3–10 times higher than the number of genes [1, 2]
Population RNA‐seq datasets To investigate the ability of machine learning (ML) approaches to detect differential isoform usage from windowed read proportions we used the GEUVADIS RNA sequencing (RNA-seq) dataset generated by Lappalainen et al [13]
In this study we aimed to investigate whether ML approaches may have utility in detecting differential isoform usage based purely on the distribution of reads across a gene region, even when the precise isoforms are unknown
Summary
The number of unique mRNA isoforms encoded by the human genome is estimated to be 3–10 times higher than the number of genes [1, 2]. This transcript diversity enables increased downstream phenotypic complexity through the expansion of the set of proteins encoded by a comparatively small set of genes [3, 4]. Conclusion: Machine learning methods can effectively detect differential isoform usage from read fraction data, and can provide novel insights into the biological differences between groups
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have