Abstract

Background“Tail-anchored (TA) proteins” is a collective term for transmembrane proteins with a C-terminal transmembrane domain (TMD) and without an N-terminal signal sequence. TA proteins account for approximately 3–5 % of all transmembrane proteins that mediate membrane fusion, regulation of apoptosis, and vesicular transport. The combined use of TMD and signal sequence prediction tools is typically required to predict TA proteins.ResultsHere we developed a prediction system named TAPPM that predicted TA proteins solely from target amino acid sequences according to the knowledge of the sequence features of TMDs and the peripheral regions of TA proteins. Manually curated TA proteins were collected from published literature. We constructed hidden markov models of TA proteins as well as three different types of transmembrane proteins with similar structures and compared their likelihoods as TA proteins.ConclusionsUsing the HMM models, we achieved high prediction accuracy; area under the receiver operator curve values reaching 0.963. A command line tool written in Python is available at https://github.com/davecao/tappm_cli.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1202-7) contains supplementary material, which is available to authorized users.

Highlights

  • For predictions with the membrane protein (MP) dataset, the nonmembrane protein (NO) set, all negative data, and the TA dataset, scores were calculated from likelihood values of the TA and MP models

  • We collected data from empirically confirmed TA proteins to include in the training data

  • Despite reduced accuracy to a certain extent compared with that of the conventional method, our method predicted the sequences of a few TA proteins that were not predicted using the conventional method

Read more

Summary

Methods

Datasets Before constructing a prediction tool using machine learning, amino acid sequence data for TA and non-TA proteins with different signal sequences were prepared as positive and negative datasets, respectively.Positive dataset We manually curated TA protein data mainly from published literature for studying TA proteins in humans (Homo sapiens), thale cress (Arabidopsis thaliana), and budding yeast (Saccharomyces cerevisiae) [14, 22, 23].To exclude the non-experimentally confirmed TA proteins, we searched initially collected TA proteins against the existing databases. Datasets Before constructing a prediction tool using machine learning, amino acid sequence data for TA and non-TA proteins with different signal sequences were prepared as positive and negative datasets, respectively. 162 clusters were generated, and their representative 162 sequences were selected as the final version of the TA protein dataset (Table 5). The condition of this clustering is not common for general reductions in homology. In our analysis of these 162 sequences, all but four sequences had pairwise sequence identities less than 20 % in the C-terminal subsequences of 30 amino acids, i.e., the parts subjected to the prediction, demonstrating that our Subcellular location

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.