Abstract

With the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems in computational biology is how to effectively formulate the sequence of a biological sample (such as DNA, RNA or protein) with a discrete model or a vector that can effectively reflect its sequence pattern information or capture its key features concerned. Although several web servers and stand-alone tools were developed to address this problem, all these tools, however, can only handle one type of samples. Furthermore, the number of their built-in properties is limited, and hence it is often difficult for users to formulate the biological sequences according to their desired features or properties. In this article, with a much larger number of built-in properties, we are to propose a much more flexible web server called Pse-in-One (http://bioinformatics.hitsz.edu.cn/Pse-in-One/), which can, through its 28 different modes, generate nearly all the possible feature vectors for DNA, RNA and protein sequences. Particularly, it can also generate those feature vectors with the properties defined by users themselves. These feature vectors can be easily combined with machine-learning algorithms to develop computational predictors and analysis methods for various tasks in bioinformatics and system biology. It is anticipated that the Pse-in-One web server will become a very useful tool in computational proteomics, genomics, as well as biological sequence analysis. Moreover, to maximize users’ convenience, its stand-alone version can also be downloaded from http://bioinformatics.hitsz.edu.cn/Pse-in-One/download/, and directly run on Windows, Linux, Unix and Mac OS.

Highlights

  • To expedite analyses of increasing number of biological sequences, many machine-learning algorithms have been introduced into computational biology

  • The aforementioned web servers did play important roles in stimulating the development of computational biology, they have the following problems: (i) lack of flexibility, i.e. they can each only handle one type of biological sequences (DNA, RNA or protein); (ii) un-catching up, i.e. they have missed some pseudo component modes proposed very recently; (iii) limitation, i.e. they cannot cover all the possible physicochemical properties, nor those defined by users themselves

  • The third category is of pseudo amino acid composition for incorporating the global or long-range sequence order information of protein sequences into their feature vectors via the physicochemical properties of their constituent amino acids

Read more

Summary

Introduction

To expedite analyses of increasing number of biological sequences, many machine-learning algorithms have been introduced into computational biology. The aforementioned web servers did play important roles in stimulating the development of computational biology, they have the following problems: (i) lack of flexibility, i.e. they can each only handle one type of biological sequences (DNA, RNA or protein); (ii) un-catching up, i.e. they have missed some pseudo component modes proposed very recently; (iii) limitation, i.e. they cannot cover all the possible physicochemical properties, nor those defined by users themselves.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call