InterProScan 5: genome-scale protein function classification

P Jones,D Binns,S Pesseat,C Mcanulla,M Scheremetjew,H Mcwilliam,A Mitchell,R Lopez,S Hunter,W Li,S.-Y Yong,M Fraser,J Maslen,A F Quinn,A Sangrador-Vegas,H.-Y Chang,G Nuka

doi:10.1093/bioinformatics/btu031

P Jones, D Binns + Show 15 more

Open Access

https://doi.org/10.1093/bioinformatics/btu031

Copy DOI

Abstract

Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.Availability and implementation: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/.Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk or mitchell@ebi.ac.uk

Highlights

The InterProScan software (Quevillon et al, 2005) is extensively used both by genome sequencing projects [Suen et al, 2011; Shulaev et al, 2011; Sato et al, 2011] and the UniProt Knowledgebase (UniProtKB) (The UniProt Consortium, 2012) to obtain a ‘first-pass’ profile of protein sequences’ potential functions
Before describing the architecture used by the new version of InterProScan, it is necessary to explain how these analysis applications work in a general sense, as it has influenced the overall design of the system
Once the search results are obtained, the InterProScan in-memory database is queried to find corresponding InterPro (Hunter et al, 2012) entries and additional database annotations, such as Gene Ontology (The Gene Ontology Consortium, 2000) terms, are associated with the results

Summary

INTRODUCTION

The InterProScan software (Quevillon et al, 2005) is extensively used both by genome sequencing projects [Suen et al, 2011; Shulaev et al, 2011; Sato et al, 2011] and the UniProt Knowledgebase (UniProtKB) (The UniProt Consortium, 2012) to obtain a ‘first-pass’ profile of protein sequences’ potential functions. It does this by combining together search applications that predict protein family membership and the presence of functional domains and sites, summarizing their outputs. This reimplementation of InterProScan addresses the previous versions’ weaknesses and adds new features to the software

SOFTWARE ARCHITECTURE

Job management

New analysis algorithm and features

Match lookup service

Installation and configuration

INTERFACES AND ACCESS

Findings

Methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Jan 29, 2014
Citations: 6621	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

InterProScan 5: genome-scale protein function classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

English
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

Climbing the Walls of Your Electronic Cage
Steven Anthony Hetcher
SSRN Electronic Journal | VOL. 98
Steven Anthony HetcherSteven Anthony Hetcher
08 Jan 2001
SSRN Electronic Journal | VOL. 98

Semantic Understanding of Source and Binary Code based on Natural Language Processing
Zhongtang Zhang ... Qichao Yang
-
Zhongtang Zhang, et. al.Zhongtang Zhang ... Qichao Yang
18 Jun 2021
18 Jun 2021

Sourcerer: An infrastructure for large-scale collection and analysis of open-source code
Sushil Bajracharya ... Cristina Lopes
Science of Computer Programming | VOL. 79
Sushil Bajracharya, et. al.Sushil Bajracharya ... Cristina Lopes
10 May 2012
Science of Computer Programming | VOL. 79

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

InterProScan 5: genome-scale protein function classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics