Floating Search Methodology for Combining Classification Models for Site Recognition in DNA Sequences.

Javier Perez-Rodriguez,Nicolas Garcia-Pedrajas,Aida De Haro-Garcia

doi:10.1109/tcbb.2020.2974221

Javier Perez-Rodriguez, Nicolas Garcia-Pedrajas + Show 1 more

Open Access

https://doi.org/10.1109/tcbb.2020.2974221

Copy DOI

Abstract

Recognition of the functional sites of genes, such as translation initiation sites, donor and acceptor splice sites and stop codons, is a relevant part of many current problems in bioinformatics. The best approaches use sophisticated classifiers, such as support vector machines. However, with the rapid accumulation of sequence data, methods for combining many sources of evidence are necessary as it is unlikely that a single classifier can solve this problem with the best possible performance. A major issue is that the number of possible models to combine is large and the use of all of these models is impractical. In this paper we present a methodology for combining many sources of information to recognize any functional site using "floating search", a powerful heuristics applicable when the cost of evaluating each solution is high. We present experiments on four functional sites in the human genome, which is used as the target genome, and use another 20 species as sources of evidence. The proposed methodology shows significant improvement over state-of-the-art methods. The results show an advantage of the proposed method and also challenge the standard assumption of using only genomes not very close and not very far from the human to improve the recognition of functional sites.

Highlights

The recognition of functional sites within the genome is one of the most important problems in bioinformatics research
The results for translation initiation sites (TISs) support our approach of using different methods and selecting the best method for each case, as there is no clear winner
The standard approach obtained a total of 1,536,902 false positives (FP); this number was reduced to 299,766, which means more than one million fewer FPs

Summary

Introduction

The recognition of functional sites within the genome is one of the most important problems in bioinformatics research. Determining where different functional sites, such 3 as promoters, translation start sites, translation initiation sites (TISs), donors, acceptors and stop codons are located provides useful information for many tasks [1]. 5. For instance, the recognition of translation initiation sites, donors, acceptors and stop 6 codons [2] is one of the most critical tasks for gene structure prediction. Many of the most successful gene recognizers that are currently in use implement an initial step of site recognition [3], which is followed by a process of combining the sites into meaningful gene structures. Accurate recognition is of the utmost importance for the whole gene structure prediction process. Many false positives might 13 inundate the second step, thereby making it difficult to predict gene structures accurately. State-of-the-art approaches use powerful classifiers, such as support vector 15 machines (SVMs), and consider moderately large sequences around the functional site of interest [2, 4,5,6]

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics	Publication Date: Feb 16, 2020
Citations: 4	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Floating Search Methodology for Combining Classification Models for Site Recognition in DNA Sequences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics

Lead the way for us

Similar Papers

Stepwise approach for combining many sources of evidence for site-recognition in genomic sequences.
Javier Pérez-Rodríguez ... Nicolás García-Pedrajas
BMC Bioinformatics | VOL. 17
Javier Pérez-Rodríguez, et. al.Javier Pérez-Rodríguez ... Nicolás García-Pedrajas
05 Mar 2016
BMC Bioinformatics | VOL. 17

COMPUTER ANALYSIS AND RECOGNITION OF FUNCTIONAL SITES VIA OLIGONUCLEOTIDE PATTERN DISTRIBUTIONS
A E Kel ... N A Kolchanov
-
A E Kel, et. al.A E Kel ... N A Kolchanov
01 Sep 1993
01 Sep 1993

Improving translation initiation site and stop codon recognition by using more than two classes.
Javier Pérez-Rodríguez ... Nicolás García-Pedrajas
Bioinformatics (Oxford, England) | VOL. 30
Javier Pérez-Rodríguez, et. al.Javier Pérez-Rodríguez ... Nicolás García-Pedrajas
04 Jun 2014
Bioinformatics (Oxford, England) | VOL. 30

Recognition of Functional Sites in Protein Structures
Alexandra Shulman-Peleg ... Haim J Wolfson
Journal of Molecular Biology | VOL. 339
Alexandra Shulman-Peleg, et. al.Alexandra Shulman-Peleg ... Haim J Wolfson
28 Apr 2004
Journal of Molecular Biology | VOL. 339

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Floating Search Methodology for Combining Classification Models for Site Recognition in DNA Sequences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics