Abstract

BackgroundIn order to capture the vital structural information of the original protein, the symbol sequence was transformed into the Markov frequency matrix according to the consecutive three residues throughout the chain. A three-dimensional sparse matrix sized 20 × 20 × 20 was obtained and expanded to one-dimensional vector. Then, an appropriate measurement matrix was selected for the vector to obtain a compressed feature set by random projection. Consequently, the new compressive sensing feature extraction technology was proposed.ResultsSeveral indexes were analyzed on the cell membrane, cytoplasm, and nucleus dataset to detect the discrimination of the features. In comparison with the traditional methods of scale wavelet energy and amino acid components, the experimental results suggested the advantage and accuracy of the features by this new method.ConclusionsThe new features extracted from this model could preserve the maximum information contained in the sequence and reflect the essential properties of the protein. Thus, it is an adequate and potential method in collecting and processing the protein sequence from a large sample size and high dimension.

Highlights

  • In order to capture the vital structural information of the original protein, the symbol sequence was transformed into the Markov frequency matrix according to the consecutive three residues throughout the chain

  • Dataset A is divided into two types according to the subcellular location, and the feature vectors are extracted in batches

  • The classification accuracy by Fuzzy C-means algorithm (FCM) algorithm is calculated in order to examine whether the Compressive Sensing (CS) method is correct and feasible

Read more

Summary

Introduction

In order to capture the vital structural information of the original protein, the symbol sequence was transformed into the Markov frequency matrix according to the consecutive three residues throughout the chain. An appropriate measurement matrix was selected for the vector to obtain a compressed feature set by random projection. Protein feature extraction is a key step to construct a predictor based on machine learning technique. The critical attributes within the protein can be obtained by extracting its features from amino acid sequences, by comparing the different features of proteins to predict the homologous biological function or identifying proteins for the localization of subcellular sites. Some software tools have been established to generate various protein features, such as Pse-in-One [1], BioSeq-Analysis [2], Pse-Analysis [3], etc. Pse-in-One is a powerful web server which covers 8 different modes to obtain protein feature vectors based on pseudo components. BioSeq-Analysis is a useful tool for biological sequence analysis which can automatically complete three steps: feature extraction, predictor construction and performance evaluation. Obtaining the maximum representative features of the nature of the characteristics is known as feature extraction [4]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.