StartLink and StartLink+: Prediction of Gene Starts in Prokaryotic Genomes.

Karl Gemayel,Alexandre Lomsadze,Mark Borodovsky

doi:10.3389/fbinf.2021.704157

Abstract

State-of-the-art algorithms of ab initio gene prediction for prokaryotic genomes were shown to be sufficiently accurate. A pair of algorithms would agree on predictions of gene 3′ends. Nonetheless, predictions of gene starts would not match for 15–25% of genes in a genome. This discrepancy is a serious issue that is difficult to be resolved due to the absence of sufficiently large sets of genes with experimentally verified starts. We have introduced StartLink that infers gene starts from conservation patterns revealed by multiple alignments of homologous nucleotide sequences. We also have introduced StartLink+ combining both ab initio and alignment-based methods. The ability of StartLink to predict the start of a given gene is restricted by the availability of homologs in a database. We observed that StartLink made predictions for 85% of genes per genome on average. The StartLink+ accuracy was shown to be 98–99% on the sets of genes with experimentally verified starts. In comparison with database annotations, we observed that the annotated gene starts deviated from the StartLink+ predictions for ∼5% of genes in AT-rich genomes and for 10–15% of genes in GC-rich genomes on average. The use of StartLink+ has a potential to significantly improve gene start annotation in genomic databases.

Highlights

Accurate gene finding creates a solid foundation for downstream inference such as the construction of the species proteome, functional annotation of proteins, and inference of cellular networks
In addition to genes missed by either GeneMarkS-2 or StartLink, StartLink+ missed genes where gene starts predicted by GeneMarkS-2 and StartLink do not match
The lowest StartLink+ coverages ∼75% were observed for M. tuberculosis and R. denitrificans

Summary

Introduction

Accurate gene finding creates a solid foundation for downstream inference such as the construction of the species proteome, functional annotation of proteins, and inference of cellular networks. Gene starts could be experimentally determined by several methods, such as N-terminal protein sequencing (Sazuka et al, 1999; Rudd, 2000; Yamazaki et al, 2006; Aivaliotis et al, 2007; Lew et al, 2011; Zhou and Rudd 2013; de Groot et al, 2014), mass spectroscopy (Rison et al, 2007), and frameshift mutagenesis (Smollett et al, 2009). Application of these methods is time-consuming; the number of genes with experimentally verified starts is limited.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in bioinformatics	Publication Date: Dec 9, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

StartLink and StartLink+: Prediction of Gene Starts in Prokaryotic Genomes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in bioinformatics

Lead the way for us

Similar Papers

DataSheet1.pdf
-
-
--
09 Dec 2021
09 Dec 2021

Investigations of Oligonucleotide Usage Variance Within and Between Prokaryotes
Jon Bohlin ... Eystein Skjerve
PLoS Computational Biology | VOL. 4
Jon Bohlin, et. al.Jon Bohlin ... Eystein Skjerve
18 Apr 2008
PLoS Computational Biology | VOL. 4

New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes
M Borodovsky
-
M BorodovskyM Borodovsky
11 Apr 2013
11 Apr 2013

Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes
Jon Bohlin ... David W Ussery
PLoS ONE | VOL. 8
Jon Bohlin, et. al.Jon Bohlin ... David W Ussery
26 Jul 2013
PLoS ONE | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

StartLink and StartLink+: Prediction of Gene Starts in Prokaryotic Genomes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in bioinformatics