Reproducible and accessible analysis of transposon insertion sequencing in Galaxy for qualitative essentiality analyses

Galaxy Team ,Delphine Larivière,Laura Wickham,Kenneth Keiler,Anton Nekrutenko

doi:10.1186/s12866-021-02184-4

Galaxy Team , Delphine Larivière + Show 3 more

Open Access

https://doi.org/10.1186/s12866-021-02184-4

Copy DOI

Journal: BMC microbiology	Publication Date: Jun 5, 2021
Citations: 2	License type: open-access

Affiliation: Pennsylvania State University

Abstract

BackgroundSignificant progress has been made in advancing and standardizing tools for human genomic and biomedical research. Yet, the field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains fragmented, lacks accessible and reusable tools, is hindered by local computational resource limitations, and does not offer widely accepted standards. One such “problem areas” is the analysis of Transposon Insertion Sequencing (TIS) data. TIS allows probing of almost the entire genome of a microorganism by introducing random insertions of transposon-derived constructs. The impact of the insertions on the survival and growth under specific conditions provides precise information about genes affecting specific phenotypic characteristics. A wide array of tools has been developed to analyze TIS data. Among the variety of options available, it is often difficult to identify which one can provide a reliable and reproducible analysis.ResultsHere we sought to understand the challenges and propose reliable practices for the analysis of TIS experiments. Using data from two recent TIS studies, we have developed a series of workflows that include multiple tools for data de-multiplexing, promoter sequence identification, transposon flank alignment, and read count repartition across the genome. Particular attention was paid to quality control procedures, such as determining the optimal tool parameters for the analysis and removal of contamination.ConclusionsOur work provides an assessment of the currently available tools for TIS data analysis. It offers ready to use workflows that can be invoked by anyone in the world using our public Galaxy platform (https://usegalaxy.org). To lower the entry barriers, we have also developed interactive tutorials explaining details of TIS data analysis procedures at https://bit.ly/gxy-tis.

Highlights

Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research
Recent Transposon Insertion Sequencing (TIS) studies, we have developed a series of workflows that include multiple tools for data de-multiplexing, promoter sequence identification, transposon flank alignment, and read count repartition across the genome
Particular attention was paid to quality control procedures, such as determining the optimal tool parameters for the analysis and removal of contamination

Summary

Introduction

Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research. The field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains fragmented, lacks accessible and reusable tools, is hindered by local computational resource limitations, and does not offer widely accepted standards. One such “problem areas” is the analysis of Transposon Insertion Sequencing (TIS) data. Regions containing insertions can tolerate disruptions and are non-essential, while location void of insertions (or with underrepresented insertions) are likely under purifying selection and are considered/essential in the given growth conditions These conclusions depend on the assumption that transposons’ random insertion will impact every gene (library saturation). In addition to binary readout (essential/non-essential), this approach yields information about the effects of up- and down-regulation of specific genes

Objectives

Methods

Results

Discussion

Conclusion