Abstract

BackgroundThe study of gene essentiality is fundamental to understand the basic principles of life, as well as for applications in many fields. In recent decades, dozens of sets of essential genes have been determined using different experimental and bioinformatics approaches, and this information has been useful for genome reduction of model organisms. Multiple in silico strategies have been developed to predict gene essentiality, but no optimal algorithm or set of gene features has been found yet, especially for non-model organisms with incomplete functional annotation.ResultsWe have developed DELEAT v0.1 (DELetion design by Essentiality Analysis Tool), an easy-to-use bioinformatic tool which integrates an in silico gene essentiality classifier in a pipeline allowing automatic design of large-scale deletions in any bacterial genome. The essentiality classifier consists of a novel logistic regression model based on only six gene features which are not dependent on experimental data or functional annotation. As a proof of concept, we have applied this pipeline to the determination of dispensable regions in the genome of Bartonella quintana str. Toulouse. In this already reduced genome, 35 possible deletions have been delimited, spanning 29% of the genome.ConclusionsBuilt on in silico gene essentiality predictions, we have developed an analysis pipeline which assists researchers throughout multiple stages of bacterial genome reduction projects, and created a novel classifier which is simple, fast, and universally applicable to any bacterial organism with a GenBank annotation file.

Highlights

  • The study of gene essentiality is fundamental to understand the basic principles of life, as well as for applications in many fields

  • Gene essentiality classifier Calculation of reference dataset and model training Computing of the six selected features for all genes labelled with a Database of Essential Genes (DEG) identifier in the 30 selected reference organisms resulted in 91,748 total data points which were used for model training and evaluation, of which 79,906 are essential genes and 11,842 are non-essential

  • Our tool allows classification of all genes in a bacterial genome according to essentiality, automatic design of large-scale genome deletions based on this data, and assistance in the genome reduction process through complementary information

Read more

Summary

Introduction

The study of gene essentiality is fundamental to understand the basic principles of life, as well as for applications in many fields. A gene is considered essential for an organism if it is indispensable for its survival, that is, if its inactivation has a lethal effect Building on this concept, the minimal genome is defined as the set of genetic elements necessary and sufficient to keep alive a modern-type cellular organism in ideal conditions, i.e. in a medium containing all essential nutrients and Solana et al BMC Bioinformatics (2021) 22:444 without stresses [5]. Multiple experimental and computational methods have been used to propose a core of essential elements that must be present in a minimal genome These attempts have followed comparative genomics strategies, manual curation of essential gene sets according to theoretically essential cellular functions, and systems approaches [6]. Systematic mutagenesis or knock-down experiments have helped determine sets of essential genes in specific organisms. Irrespective of the approach used and the global genome size, all essential gene sets share an approximate gene count (200–500 genes) and their contents can be mapped to three essential biological pillars [9]: the cellular genetic machinery (DNA, RNA and protein metabolism), energetic and intermediary metabolism, and cell envelope

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call