Linked-read Technology Research Articles

Recent advances in long fragment read (LFR, also known as linked-read technologies or read-cloud) technologies, such as single tube long fragment reads (stLFR), 10X Genomics Chromium reads, and TruSeq synthetic long-reads, have enabled efficient haplotyping and genome assembly. However, in the case of stLFR and 10X Genomics Chromium reads, the long fragments of a genome are covered sparsely by reads in each barcode and most barcodes are contained in multiple long fragments from different regions, which results in inefficient assembly when using long-range information. Thus, methods to address these shortcomings are vital for capitalizing on the additional information obtained using these technologies. We therefore designed IterCluster, a novel, alignment-free clustering algorithm that can cluster barcodes from the same target region of a genome, using -mer frequency-based features and a Markov Cluster (MCL) approach to identify enough reads in a target region of a genome to ensure sufficient target genome sequence depth. The IterCluster method was validated using BGI stLFR and 10X Genomics chromium reads datasets. IterCluster had a higher precision and recall rate on BGI stLFR data compared to 10X Genomics Chromium read data. In addition, we demonstrated how IterCluster improves the de novo assembly results when using a divide-and-conquer strategy on a human genome data set (scaffold/contig N50 = 13.2 kbp/7.1 kbp vs. 17.1 kbp/11.9 kbp before and after IterCluster, respectively). IterCluster provides a new way for determining LFR barcode enrichment and a novel approach for de novo assembly using LFR data. IterCluster is OpenSource and available on https://github.com/JianCong-WENG/IterCluster.

Read full abstract

BackgroundThe human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.ResultsBased on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are ≥ 5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer’s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer’s disease gene, found in disease cases but not in controls.ConclusionsWhile we could not formally assess the association of the CR1 frameshift mutation with Alzheimer’s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.

Read full abstract

Linked-read Technology Research Articles

Related Topics

Articles published on Linked-read Technology

IterCluster: a barcode clustering algorithm for long fragment read analysis.

Draft Genome of the Rice CoralMontipora capitata Obtained from Linked-Read Sequencing.

Draft Genome of the Rice Coral Montipora capitata Obtained from Linked-Read Sequencing.

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight

Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics.

Highly Continuous Genome Assembly of Eurasian Perch (Perca fluviatilis) Using Linked-Read Sequencing

Linked read technology for assembling large complex and polyploid genomes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Linked-read Technology Research Articles

Related Topics

Articles published on Linked-read Technology

IterCluster: a barcode clustering algorithm for long fragment read analysis.

Draft Genome of the Rice CoralMontipora capitata Obtained from Linked-Read Sequencing.

Draft Genome of the Rice Coral Montipora capitata Obtained from Linked-Read Sequencing.

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight

Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics.

Highly Continuous Genome Assembly of Eurasian Perch (Perca fluviatilis) Using Linked-Read Sequencing

Linked read technology for assembling large complex and polyploid genomes