Abstract

BackgroundHigh-accuracy prediction tools are essential in the post-genomic era to define organellar proteomes in their full complexity. We recently applied a discriminative machine learning approach to predict plant proteins carrying peroxisome targeting signals (PTS) type 1 from genome sequences. For Arabidopsis thaliana 392 gene models were predicted to be peroxisome-targeted. The predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal.ResultsIn this study, we experimentally validated the predictions in greater depth by focusing on the most challenging Arabidopsis proteins with unknown non-canonical PTS1 tripeptides and prediction scores close to the threshold. By in vivo subcellular targeting analysis, three novel PTS1 tripeptides (QRL>, SQM>, and SDL>) and two novel tripeptide residues (Q at position −3 and D at pos. -2) were identified. To understand why, among many Arabidopsis proteins carrying the same C-terminal tripeptides, these proteins were specifically predicted as peroxisomal, the residues upstream of the PTS1 tripeptide were computationally permuted and the changes in prediction scores were analyzed. The newly identified Arabidopsis proteins were found to contain four to five amino acid residues of high predicted targeting enhancing properties at position −4 to −12 in front of the non-canonical PTS1 tripeptide. The identity of the predicted targeting enhancing residues was unexpectedly diverse, comprising besides basic residues also proline, hydroxylated (Ser, Thr), hydrophobic (Ala, Val), and even acidic residues.ConclusionsOur computational and experimental analyses demonstrate that the plant PTS1 tripeptide motif is more diverse than previously thought, including an increasing number of non-canonical sequences and allowed residues. Specific targeting enhancing elements can be predicted for particular sequences of interest and are far more diverse in amino acid composition and positioning than previously assumed. Machine learning methods become indispensable to predict which specific proteins, among numerous candidate proteins carrying the same non-canonical PTS1 tripeptide, contain sufficient enhancer elements in terms of number, positioning and total strength to cause peroxisome targeting.

Highlights

  • High-accuracy prediction tools are essential in the post-genomic era to define organellar proteomes in their full complexity

  • Selection of predicted Arabidopsis peroxisome targeting signal of either type 1 (PTS1) proteins for experimental validation To validate the algorithms in greater depth, we selected further Arabidopsis proteins of interest that followed specific criteria

  • We focused on proteins that preferentially carried putative novel PTS1 tripeptide residues, i.e. either at position −3 (H, R, and Q in HKL>, RKM>, QRL>, potentially novel residues underlined) or at position −2 (D in SDL>, Table 1)

Read more

Summary

Introduction

High-accuracy prediction tools are essential in the post-genomic era to define organellar proteomes in their full complexity. Many novel organellar proteins have been identified and their physiological functions have been defined at the molecular level. Despite this success, these experimental methods are limited in protein identification capabilities by several parameters, for instance, by technological sensitivity and organelle purity, and to major plant tissues and organs. These experimental methods are limited in protein identification capabilities by several parameters, for instance, by technological sensitivity and organelle purity, and to major plant tissues and organs This holds true for small and fragile organelles such as peroxisomes that can only be isolated in sufficient purity and quantity from a few plant species, generally only from one tissue type per organism (leaves, cotyledons, or endosperm) and only from organisms raised under optimal growth conditions. Complementary to experimental proteome research, protein targeting prediction from genome sequences has emerged as a central and essential tool in the post-genomic era to define organellar proteomes and to understand metabolic and regulatory networks [1,2,3,4]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call