PolishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.

Jennifer Chang,Anna K Childers,Amanda R Stahlke,Andrew J Severin,Benjamin D Rosen,Sivanandan Chudalayandi

doi:10.1093/gbe/evad020

Abstract

Long-read sequencing has revolutionized genome assembly, yielding highly contiguous, chromosome-level contigs. However, assemblies from some third generation long read technologies, such as Pacific Biosciences (PacBio) continuous long reads (CLR), have a high error rate. Such errors can be corrected with short reads through a process called polishing. Although best practices for polishing non-model de novo genome assemblies were recently described by the Vertebrate Genome Project (VGP) Assembly community, there is a need for a publicly available, reproducible workflow that can be easily implemented and run on a conventional high performance computing environment. Here, we describe polishCLR (https://github.com/isugifNF/polishCLR), a reproducible Nextflow workflow that implements best practices for polishing assemblies made from CLR data. PolishCLR can be initiated from several input options that extend best practices to suboptimal cases. It also provides re-entry points throughout several key processes, including identifying duplicate haplotypes in purge_dups, allowing a break for scaffolding if data are available, and throughout multiple rounds of polishing and evaluation with Arrow and FreeBayes. PolishCLR is containerized and publicly available for the greater assembly community as a tool to complete assemblies from existing, error-prone long-read data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Biology and Evolution	Publication Date: Feb 16, 2023
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PolishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.

Abstract

Talk to us

Similar Papers

More From: Genome Biology and Evolution

Lead the way for us

Similar Papers

Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads
Viraj Deshpande ... Son Pham
-
Viraj Deshpande, et. al.Viraj Deshpande ... Son Pham
01 Jan 2013
01 Jan 2013

Long-read sequencing in ecology and evolution: Understanding how complex genetic and epigenetic variants shape biodiversity.
Dan G Bock ... Polina Novikova
Molecular ecology | VOL. 32
Dan G Bock, et. al.Dan G Bock ... Polina Novikova
01 Mar 2023
Molecular ecology | VOL. 32

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.
Ryan R Wick ... Louise M Judd
PLOS Computational Biology | VOL. 13
Ryan R Wick, et. al.Ryan R Wick ... Louise M Judd
08 Jun 2017
PLOS Computational Biology | VOL. 13

Complete Genome Sequences of Four Strains ofErwiniatracheiphila: A Resource for Studying aBacterial Plant Pathogen with a Highly Complex Genome.
Breah Lasarre ... Gwyn A Beattie
Molecular Plant-Microbe Interactions® | VOL. 35
Breah Lasarre, et. al.Breah Lasarre ... Gwyn A Beattie
01 May 2022
Molecular Plant-Microbe Interactions® | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PolishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.

Abstract

Talk to us

Similar Papers

More From: Genome Biology and Evolution