Abstract
Increasingly sensitive and large scale genome sequencing has revealed a massive breadth of heterogeneity between clones in competitive environments. The study of large sets of genome data from the epithelial tissues of healthy patients have shown how mutationally diverse aged tissues become over time. In the interpretation of such datasets, a fundamental question is whether the mutations are selected for in competitive processes, or they arise through neutral drift. The ratio of synonymous to non-synonymous mutations (dn/ds) is commonly used to establish evidence of selection. Other techniques attempt to score codons or residues as “pathogenic” through training complex models against of diverse datasets, and using this score to weight subsequent analysis of mutations. In both cases, whilst evidence of selection may be found, there exist known limitations to the techniques and interpretation of the functional impact requires further work. Here we present “Darwinian Shift”, a workflow for using user selected metrics of protein or gene activity to establish evidence and mechanisms of selection. Testing against published healthy patient datasets, we show that through uses of conservation scores and mutations that induce residue swaps, the approach can reproduce the findings of existing techniques in the literature to establish evidence of selection. We go on to show how we can use folding energies, calculated with FoldX and Rosetta alongside more abstract measures of disorder, can identify an overlapping, alternative set of genes that undergo misfolding due to selected-for mutations. We also use high throughput molecular dynamics simulations of the transmembrane helix of NOTCH1 to explore how mutations change the orientation of the helix.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have