Abstract
The aim of the thesis is to develop novel hybrid computational models for protein sequence analysis and secondary structure prediction. The research work specifically deals with (a) protein sequence alignment and family identification, (b) prediction of secondary structures and (c) prediction of contact maps and contact numbers. Protein sequence alignment and family identification has been approached widely in the past using classical profile hidden Markov model (HMM) based on probability theory. Despite being used successfully, a profile HMM has a limitation of inherent statistical independence assumptions. To overcome this limitation, a novel architecture of fuzzy profile HMM incorporating fuzzy measures and integrals is presented. The superior performance of the fuzzy profile HMM over the classical profile HMM is established based on the experiments carried out using widely studied globin and kinase families. A comparative study using Z-score plots and ROC analysis is also carried out on three different variants of fuzzy profile HMM based on possibility, lambda (.>.) and belief measures. The possibility measure based fuzzy profile HMM demonstrated the best performance. For secondary structure prediction, the prominent methods are mostly based on neural networks, which involve mappings from a local window of residues in the sequence to the structural state of the central residue in the window, thus capturing the local interactions more effectively than distant interactions among residues. Alternatively, secondary structure prediction problem has been approached using generative models based on semi hidden Markov models. These models have been effective in capturing non-local interactions through a joint sequence-structure probability distribution based on structural segments. In work reported in the thesis, investigations are done using a hierarchical model based on semi hidden Markov model and neural network together with physical-chemical and structural properties of the amino acids without using evolutionary information (viz., single sequence methods). The proposed hybrid model exploits the relative advantages of semi hidden Markov models, neural networks, and physical-chemical properties of the amino acids for secondary structure prediction. The performance of the proposed architecture is further enhanced using neural network optimization and ensemble techniques. The novelty of the proposed architecture lies in its design and integration of different components. Secondary structure of proteins is also influenced by the residue contact maps and contact numbers. Novel Residue Contact Order matrices are proposed to study the preferences of the amino acid residues for structural types based on contacts at different positions. The complementary information provided by these matrices is incorporated in the semi hidden Markov model, which achieves better accuracies compared to conventional approaches without this information. Further, a detailed theoretical framework has also been developed for Markov chain Monte Carlo sampling in the semi hidden Markov model to predict contact maps and numbers. Investigations show that the proposed approach observes the pattern of contact maps and contact numbers closely.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have