Abstract

Protein remote homology detection and fold recognition are critical for the studies of protein structure and function. Currently, the profile-based methods showed the state-of-the-art performance in this field, which are based on widely used sequence profiles, such as Position-Specific Frequency Matrix (PSFM) and Position-Specific Scoring Matrix (PSSM). However, these approaches ignore the sequence-order effects along protein sequence. In this study, we proposed a novel profile, called Sequence-Order Frequency Matrix (SOFM), which can incorporate the sequence-order information and extract the evolutionary information from Multiple Sequence Alignment (MSA). Statistical tests and experimental results demonstrated its effects. Combined with a previously proposed approach Top-n-grams, the SOFM was then applied to remote homology detection and fold recognition, and a computational predictor called SOFM-Top was proposed. Evaluated on four benchmark datasets, it outperformed other state-of-the-art methods in this filed, indicating that SOFM-Top would be a more useful tool, and SOFM is a richer representation than PSFM and PSSM. SOFM will have many potential applications since profiles have been widely used for constructing computational predictors in the studies of protein structure and function.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call