A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Taysir Hassan,Marwa M,Ahmed Sharaf,Mohammed E

doi:10.14569/ijacsa.2014.051214

Abstract

Protein fold recognition plays an important role in computational protein analysis since it can determine protein function whose structure is unknown. In this paper, a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF) is proposed. CSPF technique consists of two main phases: the sequential mining pattern phase and the fold recognition phase. In the sequential mining pattern phase, Mix & Test algorithm is developed based on Grammatical Inference, which is used as a training phase. Mix & Test algorithm minimizes I/O costs by one database scan, discovers subsequence combinations directly from sequences in memory without searching the whole sequence file, has no database projection, handles gaps, and works with variant length sequences without having to align them. In addition, a parallelized version of Mix & Test algorithm is applied to speed up Mix & Test algorithm performance. In the fold recognition phase, unknown protein folds are predicted via a proposed testing function. To test the performance, 36 SCOP protein folds are used, where the accuracy rate is 75.84% for training data and 59.7% for testing data.

Highlights

Protein fold recognition is an important step towards understanding protein three-dimensional structures and their biological functions
We introduce a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF)
We proposed a CSFP technique for protein fold recognition

Summary

INTRODUCTION

Protein fold recognition is an important step towards understanding protein three-dimensional structures and their biological functions. Sequential mining algorithms have been proposed to predict protein folds. One of the SPADE based algorithm called SPAM (Sequential PAttern Mining) [39] has been proposed. GI is used as the backbone of the sequential pattern mining algorithm, which has achieved faster and higher performance accuracy than other sequential pattern mining algorithms for protein fold recognition. We introduce a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF). CSPF consists of two main phases: 1) Sequential pattern mining and 2) fold recognition. It handles gap constraints, uses data parallelization, and performs incremental updating.

METHODS

Phase I: Sequential Pattern Mining

Apply Mix Strategy to generate sequential combination

Phase II

Performance analysis of no gap mix strategy

Performance analysis of gapped mix strategy

Performance Analysis of Memory Consumption

Performance analysis of Incremental Updating Process

Performance Analysis of Fold recognition Phase

Findings

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2014
Citations: 36	License type: cc-by

R Discovery Prime

R Discovery Prime

A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

An efficient model for information gain of sequential pattern from web logs based on dynamic weight constraint
Dhirendra Kumar Jha ... Archana Tomar
-
Dhirendra Kumar Jha, et. al.Dhirendra Kumar Jha ... Archana Tomar
01 Oct 2010
01 Oct 2010

WIS: Weighted Interesting Sequential Pattern Mining with a Similar Level of Support and/or Weight
Unil Yun
ETRI Journal | VOL. 29
Unil YunUnil Yun
08 Jun 2007
ETRI Journal | VOL. 29

Techniques for Understanding User Usage Behavior on the Internet
Abhijit R Joshi ... Aparna Ranade-Halbe
International Journal of Computer Applications | VOL. 92
Abhijit R Joshi, et. al.Abhijit R Joshi ... Aparna Ranade-Halbe
18 Apr 2014
International Journal of Computer Applications | VOL. 92

Pushing Constraints to Generate Top-K Closed Sequential Graph Patterns
K Thammi ... S Sumalatha
International Journal of Computer Applications | VOL. 137
K Thammi, et. al.K Thammi ... S Sumalatha
17 Mar 2016
International Journal of Computer Applications | VOL. 137

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications