Abstract

AbstractBackgroundRNA‐seq experiments have traditionally been done with short‐read sequencing technologies that, by nature, collapse all RNA isoforms for a given gene into a single expression measurement—a major oversimplification of the underlying biology. Collapsing all RNA isoforms for a single gene severely limits our ability to characterize all RNA isoforms and determine their individual downstream functions. While computational approaches for assembling short reads into full transcripts exist, these methods are inherently structurally inaccurate, especially when compared to full‐length sequencing possible with long‐read sequencing technologies. In contrast, long‐read sequencing technologies can sequence entire RNA molecules, allowing researchers to accurately quantify expression for the complete set of RNA isoform species, including de novo RNA isoforms. Long‐read sequencing is especially well suited for discovering novel isoforms in the recently released telomere‐to‐telomere (T2T) human reference genome (CHM13). The new CHM13 genome assembly resolved highly homologous regions that are challenging to study with short‐reads. Here we sequenced post‐mortem human brain tissue with long‐reads and aligned them to CHM13 to explore novel gene bodies and transcript isoforms.MethodsWe sequenced pre‐frontal cortex tissue from four post‐mortem human brain samples using Oxford Nanopore Technologies long‐read sequencing for PCR amplified cDNA. Data were basecalled using Guppy, reads were aligned to the CHM13 human reference genome using minimap2, and transcripts were assembled and quantified with the Bambu package in R.ResultsAmong other findings, we discovered 84 new, high‐confidence gene bodies expressed in all four samples with at least 5 reads in each sample. We also found 223 novel RNA isoforms in previously annotated gene bodies. Of these 223 novel isoforms, 29 aligned to medically relevant genes such as MAOB, HLA‐DRB1, and ABO.ConclusionsOur results suggest long‐reads aligned to the CHM13 reference genome have the potential to reveal novel gene bodies and transcript isoforms that were missed in previous studies. These methods can provide a more complete picture of the transcriptomic landscape of diseases, including Alzheimer’s disease—hopefully generating information with potential to help inform future treatment and/or early diagnostic efforts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call