Semi-Supervised Learning for Classification of Protein Sequence Data

Brian R King,Chittibabu Guda

doi:10.1155/2008/795010

Abstract

Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Jan 1, 2008
Citations: 57	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Semi-Supervised Learning for Classification of Protein Sequence Data

Abstract

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

Predicting protein-binding RNA nucleotides with consideration of binding partners
Narankhuu Tuvshinjargal ... Kyungsook Han
Computer Methods and Programs in Biomedicine | VOL. 120
Narankhuu Tuvshinjargal, et. al.Narankhuu Tuvshinjargal ... Kyungsook Han
08 Apr 2015
Computer Methods and Programs in Biomedicine | VOL. 120

Protein Sequence Classification Using Feature Hashing
Cornelia Caragea ... Adrian Silvescu
-
Cornelia Caragea, et. al.Cornelia Caragea ... Adrian Silvescu
01 Nov 2011
01 Nov 2011

Protein sequence classification using feature hashing.
Cornelia Caragea ... Adrian Silvescu
Proteome Science | VOL. Suppl 10 1
Cornelia Caragea, et. al.Cornelia Caragea ... Adrian Silvescu
01 Jan 2012
Proteome Science | VOL. Suppl 10 1

4-Dihydromethyltrisporate dehydrogenase from Mucor mucedo, an enzyme of the sexual hormone pathway: purification, and cloning of the corresponding gene.
K Czempinski ... A Burmester
Microbiology | VOL. 142 ( Pt 9)
K Czempinski, et. al.K Czempinski ... A Burmester
01 Sep 1996
Microbiology | VOL. 142 ( Pt 9)

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-Supervised Learning for Classification of Protein Sequence Data

Abstract

Talk to us

Similar Papers

More From: Scientific Programming