Abstract
In this thesis, the application of convolutional and recurrent machine learning techniques to several key structural properties of proteins is explored. Chapter 2 presents the rst application of an LSTM-BRNN in structural bioinformat- ics. The method, called SPOT-Disorder, predicts the per-residue probability of a protein being intrinsically disordered (ie. unstructured, or exible). Using this methodology, SPOT-Disorder achieved the highest accuracy in the literature without separating short and long disordered regions during training as was required in previous models, and was additionally proven capable of indirectly discerning functional sites located in disordered regions. Chapter 3 extends the application of an LSTM-BRNN to a two-dimensional problem in the prediction of protein contact maps. Protein contact maps describe the intra-sequence distance between each residue pairing at a distance cuto , providing key restraints towards the possible conformations of a protein. This work, entitled SPOT-Contact, introduced the coupling of two-dimensional LSTM-BRNNs with ResNets to maximise dependency propagation in order to achieve the highest reported accuracies for contact map preci- sion. Several models of varying architectures were trained and combined as an ensemble predictor in order to minimise incorrect generalisations. Chapter 4 discusses the utilisation of an ensemble of LSTM-BRNNs and ResNets to predict local protein one-dimensional structural properties. The method, called SPOT-1D, predicts for a wide range of local structural descriptors, including several solvent exposure metrics, secondary structure, and real-valued backbone angles. SPOT-1D was signi cantly improved by the inclusion of the outputs of SPOT-Contact in the input features. Using this topology led to the best reported accuracy metrics for all predicted properties. The protein structures constructed by the backbone angles predicted by SPOT-1D achieved the lowest average error from their native structures in the literature. Chapter 5 presents an update on SPOT-Disorder, as it employs the inputs from SPOT- 1D in conjunction with an ensemble of LSTM-BRNN's and Inception Residual Squeeze and Excitation networks to predict for protein intrinsic disorder. This model con rmed the enhancement provided by utilising the coupled architectures over the LSTM-BRNN solely, whilst also introducing a new convolutional format to the bioinformatics eld. The work in Chapter 6 utilises the same topology from SPOT-1D for single-sequence prediction of protein intrinsic disorder in SPOT-Disorder-Single. Single-sequence predic- tion describes the prediction of a protein's properties without the use of evolutionary information. While evolutionary information generally improves the performance of a computational model, it comes at the expense of a greatly increased computational and time load. Removing this from the model allows for genome-scale protein analysis at a minor drop in accuracy. However, models trained without evolutionary profi les can be more accurate for proteins with limited and therefore unreliable evolutionary information.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.