Abstract
Enzyme function is much less conserved than anticipated, i.e., the requirement for sequence similarity that implies similarity in enzymatic function is much higher than the requirement that implies similarity in protein structure. This is because the function of an enzyme is an extremely complicated problem that may involve very subtle structural details as well as many other physical chemistry factors. Accordingly, if simply based on the sequence similarity approach, it would hardly get a decent success rate in predicting enzyme sub-class even for a dataset consisting of samples with ⩾50% sequence identity. To cope with such a situation, the GO-PseAA predictor was adopted to identify the sub-class for each of the six main enzyme families. It has been observed that, even for the much more stringent datasets in which none of the enzymes has ⩾25% sequence identity to any others, the overall success rates are 73–95%, suggesting that the GO-PseAA predictor can catch the core features of the statistical samples concerned and may become a useful high throughput tool in proteomics and bioinformatics.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have