Abstract

The advent of long-read sequencing offers a new assessment method of detecting genomic structural variation (SV) in numerous rare genetic diseases. For autism spectrum disorders (ASD) cases where pathogenic variants fail to be found in the protein-coding genic regions along chromosomes, we proposed a scalable workflow to characterize the risk factor of SVs impacting non-coding elements of the genome. We applied whole-genome sequencing on an Emirati family having three children with ASD using long and short-read sequencing technology. A series of analytical pipelines were established to identify a set of SVs with high sensitivity and specificity. At 15-fold coverage, we observed that long-read sequencing technology (987 variants) detected a significantly higher number of SVs when compared to variants detected using short-read technology (509 variants) (p-value < 1.1020 × 10−57). Further comparison showed 97.9% of long-read sequencing variants were spanning within the 1–100 kb size range (p-value < 9.080 × 10−67) and impacting over 5000 genes. Moreover, long-read variants detected 604 non-coding RNAs (p-value < 9.02 × 10−9), comprising 58% microRNA, 31.9% lncRNA, and 9.1% snoRNA. Even at low coverage, long-read sequencing has shown to be a reliable technology in detecting SVs impacting complex elements of the genome.

Highlights

  • Structural genomic variants (SVs) such as copy-number variations (CNVs), inversions, and rearrangements account for a large portion of genetic diversity among individuals [1]

  • Our analysis of de novo SNVs identified variants impacting DXO and CLCA4 gene were classified a variant of unknown significance (VOUS) following the American College of Medical Genetics and Genomics (ACMG) guidelines (Figure S1). In such a scenario, where clinically diagnosed patients exhibited a low likelihood of pathogenic variants contributing to their condition in the protein-coding genes, we aimed to study the underrepresented complex non-coding elements of the genome

  • We report the whole-genome long-read sequencing (LRS) for a family with triplet children diagnosed with autism spectrum disorders (ASD)

Read more

Summary

Introduction

Structural genomic variants (SVs) such as copy-number variations (CNVs) (deletions and duplications), inversions, and rearrangements account for a large portion of genetic diversity among individuals [1]. Short-read based next-generation sequencing technologies aid in the construction of complex genome assemblies, as well as identification and annotation of critical SVs [5]. They require substantial improvement due to the complexity of the genomic context containing significant repeated regions [7,8], highlighting the need for more sensitive and accurate technologies to detect and interpret the SVs [9]. Short-read sequencing (SRS) technologies (i.e., Illumina) and their analytical tools are mature, reliable, and cost-effective to detect single nucleotide variants and small indels, the sensitivity and specificity to detect SVs across different size spectrums is still rudimentary, within SVs within the complex genomic regions [10]. Research shows that nanopore technology builds high-quality reference genomes [15], de novo assembly [16], and fills the gaps missed by SRS, enabling easy characterization of SVs and discovery of novel variants [17]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call