Abstract

BiDaS is a web-application that can generate massive Monte Carlo simulated sequence or numerical feature data sets (e.g. dinucleotide content, composition, transition, distribution properties) based on small user-provided data sets. BiDaS server enables users to analyze their data and generate large amounts of: (i) Simulated DNA/RNA and aminoacid (AA) sequences following practically identical sequence and/or extracted feature distributions with the original data. (ii) Simulated numerical features, presenting identical distributions, while preserving the exact 2D or 3D between-feature correlations observed in the original data sets. The server can project the provided sequences to multidimensional feature spaces based on: (i) 38 DNA/RNA features describing conformational and physicochemical nucleotide sequence features from the B-DNA-VIDEO database, (ii) 122 DNA/RNA features based on conformational and thermodynamic dinucleotide properties from the DiProDB database and (iii) Pseudo-aminoacid composition of the initial sequences. To the best of our knowledge, this is the first available web-server that allows users to generate vast numbers of biological data sets with realistic characteristics, while keeping between-feature associations. These data sets can be used for a wide variety of current biological problems, such as the in-depth study of gene, transcript, peptide and protein groups/families; the creation of large data sets from just a few available members and the strengthening of machine learning classifiers. All simulations use advanced Monte Carlo sampling techniques. The BiDaS web-application is available at http://bioserver-3.bioacademy.gr/Bioserver/BiDaS/.

Highlights

  • The advent of new and powerful workstations and supercomputers has enabled systems biologists, computational biologists and bioinformaticians to design and implement complex biological models [1]

  • The web server can analyze and simulate data following identical feature distributions with the original data set (1D simulation), while safeguarding all 2D and 3D between-feature correlations observed in the provided data

  • ‘Sequence Driven simulation of Numerical Features’: users having DNA/RNA and AA sequences can calculate up to ten 10 numerical features using pseudo-AA composition [10,11], 38 DNA/RNA features based on conformational and physicochemical DNA features from B-DNA-VIDEO database [12] and 122 DNA/RNA features based on conformational and thermodynamic dinucleotide properties from DiProDB database [13]. These features are concurrently calculated for sequences belonging to the original user-provided data set, as well as for any sequences generated using the ‘De novo simulation of Sequences’ module

Read more

Summary

Introduction

The advent of new and powerful workstations and supercomputers has enabled systems biologists, computational biologists and bioinformaticians to design and implement complex biological models [1]. BiDaS simulator provides a userfriendly interface helping users to accurately generate data that follow evident, as well as hidden properties of the original data sets. The users can analyze the provided sequences and generate MC-simulated sequences having identical characteristics with the original samples, such as length distribution and nucleotide/AA compositional probabilities.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call