Abstract

In this PhD project, several related research topics are pursued. These projects include data mining of coarse-grained side chain orientation in the protein data bank and the prediction of such orientation for each individual residue using statistical learning methods, the motions of protein and protein complexes using the elastic network model and statistical methods and clustering of structures within an ensemble of NMR-derived protein structures. The first research topic is about the side chain orientation in protein structures. A coarse-grained measurement for side chain orientation is used, and the relationship between this type of side chain orientation measurement and the hydrophobicity of residue type is established. Along with the research on the side chain orientation, visualization software to visualize this coarse-grained side chain orientation is developed using openGL and C++ language. In addition, several predictive models for side chain orientation of individual residues are constructed using several statistical machine learning methods (General linear regression, Regression tree, Bagging of regression tree, Neural Network and Support Vector Machine). The second topic is about the dynamics of protein and protein complexes using the elastic network model. In this part, the effects of different superposition methods on the correspondence between the experimental conformational changes extracted from the cluster of structures using principal component analysis and the normal modes are studied, and we obtain a better correspondence for some protein structures using the maximum likelihood based superposition method. In addition, we also apply the elastic network model to study the dynamics of the small ribosomal subunit. In this project, we perform a series of protein subunit removal computational experiments and study the effect of removing some protein subunits on the motion of the partial 30S structures simulated with the elastic network model. Through these studies, we find that S6 interacts with S18 in the small ribosomal subunit, which is consistent with the previous computation and experimental results from other researchers. xiv Another project is the application of principal component shaving method for clustering structures in an ensemble of NMR-derived protein structures. Principal component shaving is often used to find the similar gene expression pattern in microarray experiment, and this method is applied to cluster similar structures in an ensemble of NMR-derived protein structures. The results show that similar structures can be clustered together by using this method. For this PhD project, the results from coarse-grained side chain orientation and prediction for side chain orientation for each residue are already published. I was the first author for these two papers. For the study of the effects of different superposition methods on the correspondence between the experimental conformational changes from principal component analysis and the normal modes, the application of ANM in 30S subunit and the application of the principal component shaving for clustering structures for an ensemble of NMR-derived, we will submit our papers soon.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call