Abstract

Ribonucleic acids (RNAs) are critical players in cellular activities. They are, e.g., involved in coding, decoding, regulation, and expression of genes. Their function is related to their three-dimensional (3D) structure. Consequently, understanding their structure is critical for understanding their function. Technological advances in high-throughput sequencing methods have made it possible to sequence many RNAs. The sequenced RNAs are stored in public databases, and the number of sequenced RNAs keeps growing. Nevertheless, the vast majority of them lack the corresponding three-dimensional structure---since RNA molecules are incredibly flexible, experimental RNA structure determination is challenging. A complementary approach is to use computer simulations to model RNA 3D structure starting from the sequence. Such computer simulations to predict bio-molecular 3D structure can be highly challenging as the energy landscape is enormous and complex. Including a priori information in molecular modeling tools can help guide structure prediction more accurately by reducing the search space to the energy landscape. A particular example is providing pairs of nucleobases known to be spatially proximal as restraints. While several experimental approaches exist, a theoretical approach uses sophisticated statistical and machine learning algorithms to mine information about nucleobase pairs from sequences. During the course of evolution, RNAs undergo mutations. Mutations that do not adversely affect survival take place randomly. However, others must occur in tandem--a change in nucleobase of an RNA in one place can trigger a complementary change in sequentially far region in the RNA sequence but in proximity within the 3D structure---to preserve the structure and function of RNA and ensure the survival of organisms. Coordinated mutations leave imprints of nucleotide pair co-evolution, and this co-evolutionary information may be extracted from multiple sequence alignment (MSA) of homologous RNAs using sophisticated algorithms. In the last decade, inverse statistical methods based on generative models known as direct-coupling analysis (DCA) have shown tremendous success in predicting spatially adjacent residue pairs of proteins from MSA data. These pairs are incorporated with molecular modeling tools resulting in accurate protein 3D structure prediction at the level of experimental resolution. Inverse statistical methods are also recently started to be used in RNA 3D structure prediction, but their success is somewhat limited compared to protein structure prediction. This thesis presents a new and improved RNA contact prediction method and its application for RNA 3D structure prediction. In particular, the thesis (i) presents software implementation of state-of-the-art DCA algorithms that are contained in a light-weight, stand-alone, and open-source software; (ii) makes available a curated RNA dataset to test and compare the performance of contact prediction algorithms on the dataset; (iii) introduces a new and improved RNA contact prediction algorithm based on a combination of DCA and convolutional neural network that improves RNA contact prediction from MSA and; (iv) finally, provides a workflow for the RNA 3D structure prediction using putative contacts obtained from the new algorithm as restraints with a molecular modeling tool based on coarse-grained replica-exchange Monte-Carlo method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call