Abstract

Identifying protein coding regions in DNA sequences by computational methods is an active research topic. Welan gum produced by Sphingomonas sp. WG has great application potential in oil recovery and concrete construction industry. Predicting the coding regions in the Sphingomonas sp. WG genome and addressing the mechanism underlying the explanation for the synthesis of Welan gum metabolism is an important issue at present. In this study, we apply a self adaptive spectral rotation (SASR, for short) method, which is based on the investigation of the Triplet Periodicity property, to predict the coding regions of the whole-genome data of Sphingomonas sp. WG without any previous training process, and 1115 suspected gene fragments are obtained. Suspected gene fragments are subjected to a similarity search against the non-redundant protein sequences (nr) database of NCBI with blastx, and 762 suspected gene fragments have been labeled as genes in the nr database.

Highlights

  • Genetic information is a set of general instructions that directs the translation from DNA to proteins

  • Gene is a nucleotide sequence that can encode a substance with a certain biological function, which is the main carrier of the genetic inheritance of biological traits carrying protein information

  • Without any preceding training process, the SASR method based on the Triplet Periodicity (TP) property of the coding region provides a visualized presentation of unannotated protein-coding regions in DNA sequences, which implements the prediction of the coding regions in the DNA sequence

Read more

Summary

Introduction

Genetic information is a set of general instructions that directs the translation from DNA to proteins. The information carried by DNA is expressed as proteins to construct cell components and perform genetic instructions for life [1]. Gene is a nucleotide sequence that can encode a substance with a certain biological function, which is the main carrier of the genetic inheritance of biological traits carrying protein information. The coding sequences of eukaryotic genes are not continuously arranged on the DNA molecule but are separated by non-coding introns, and the synthesis of protein is guided by the coding exons. After a given genomic sequence, it is one of the central issues in bioinformatics to correctly identify the range of protein coding region in the DNA sequence and the precise position in the genomic sequence [2, 3]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call