Abstract
The prediction of protein subcellular localization is a important step towards the prediction of protein function, and considerable effort has gone over the last decade into the development of computational predictors of protein localization. In this article we design a new predictor of protein subcellular localization, based on a Machine Learning model (N-to-1 Neural Networks) which we have recently developed. This system, in three versions specialised, respectively, on Plants, Fungi and Animals, has a rich output which incorporates the class “organelle” alongside cytoplasm, nucleus, mitochondria and extracellular, and, additionally, chloroplast in the case of Plants. We investigate the information gain of introducing additional inputs, including predicted secondary structure, and localization information from homologous sequences. To accommodate the latter we design a new algorithm which we present here for the first time. While we do not observe any improvement when including predicted secondary structure, we measure significant overall gains when adding homology information. The final predictor including homology information correctly predicts 74%, 79% and 60% of all proteins in the case of Fungi, Animals and Plants, respectively, and outperforms our previous, state-of-the-art predictor SCLpred, and the popular predictor BaCelLo. We also observe that the contribution of homology information becomes dominant over sequence information for sequence identity values exceeding 50% for Animals and Fungi, and 60% for Plants, confirming that subcellular localization is less conserved than structure.SCLpredT is publicly available at http://distillf.ucd.ie/sclpredt/. Sequence- or template-based predictions can be obtained, and up to 32kbytes of input can be processed in a single submission.
Highlights
As the number of known protein sequences keeps growing, the necessity for fast, reliable annotations for these proteins continues to be of great importance, and is likely to remain so for the foreseeable future
In (Mooney et al 2011) we proposed a system for protein subcellular localization based on a novel neural network architecture
Protein subcellular localization prediction is very closely related to the goal of protein function prediction: knowing in which cellular component proteins carry out their function is a first indication of what their function may be
Summary
As the number of known protein sequences keeps growing, the necessity for fast, reliable annotations for these proteins continues to be of great importance, and is likely to remain so for the foreseeable future. As experimental methods are expensive, laborious and not always applicable, a substantial amount of work has been carried out, and is ongoing in the bioinformatics research community, for developing computational approaches able to Protein function prediction, in particular, is one of the major challenges of bioinformatics. Being able to annotate protein functions fast and cheaply by computational means would produce a quantum leap in our knowledge of biology at a molecular level. Such knowledge, if accurate, might be effectively harnessed for knowledge discovery and, medical therapy and drug design. As protein localization may be used as a starting point in function prediction systems, the former problem may be considered a subtask and an integral part of the latter. In (Casadio et al 2008) and (Mooney et al 2011) overviews of subcellular localization techniques are provided and many of the best performing public predictors are benchmarked
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.