Long-read-tools.org: an interactive catalogue of analysis methods for long-read sequencing data.

Shanika L Amarasinghe,Matthew E Ritchie,Quentin Gouil

doi:10.1093/gigascience/giab003

Shanika L Amarasinghe, Matthew E Ritchie + Show 1 more

Open Access

https://doi.org/10.1093/gigascience/giab003

Copy DOI

Abstract

BackgroundThe data produced by long-read third-generation sequencers have unique characteristics compared to short-read sequencing data, often requiring tailored analysis tools for tasks ranging from quality control to downstream processing. The rapid growth in software that addresses these challenges for different genomics applications is difficult to keep track of, which makes it hard for users to choose the most appropriate tool for their analysis goal and for developers to identify areas of need and existing solutions to benchmark against.FindingsWe describe the implementation of long-read-tools.org, an open-source database that organizes the rapidly expanding collection of long-read data analysis tools and allows its exploration through interactive browsing and filtering. The current database release contains 478 tools across 32 categories. Most tools are developed in Python, and the most frequent analysis tasks include base calling, de novo assembly, error correction, quality checking/filtering, and isoform detection, while long-read single-cell data analysis and transcriptomics are areas with the fewest tools available.ConclusionContinued growth in the application of long-read sequencing in genomics research positions the long-read-tools.org database as an essential resource that allows researchers to keep abreast of both established and emerging software to help guide the selection of the most relevant tool for their analysis needs.

Highlights

The data produced by long-read third-generation sequencers have unique characteristics compared to short-read sequencing data, often requiring tailored analysis tools for tasks ranging from quality control to downstream processing
We describe the implementation of long-read-tools.org, an open-source database that organizes the rapidly expanding collection of long-read data analysis tools and allows its exploration through interactive browsing and filtering
The long-read-tools.org database is designed to catalogue analysis tools for long reads generated from genuine (Pacific Biosciences [PacBio] and Oxford Nanopore Technologies [ONT]) and synthetic (e.g., Hi-C, 10x, Bionano Genomics) longread technologies

Summary

Background

Long-read sequencing technologies facilitate versatile exploration of genomes owing to their ability to generate reads spanning several thousand base pairs [1]. To keep up with the rapid growth in software for long-read analysis, we collated and categorized existing long-read analysis tools at long-read-tools.org. This database enables easy navigation of the available software, allowing users to filter by specific tasks to identify methods that suit their analysis objectives. 2 long-read-tools.org: catalogue of analysis methods for long-read sequencing data

Findings

Summary and Future Work

32. Vilella