Sequence conservation analyses offer us a powerful glimpse of natural selection at work. Standard tools for measuring sequence conservation report conservation as a function of a specific location in a multiple sequence alignment and have proven indispensable in identifying highly constrained features such as active site residues. The advent of large-scale genomic sequencing efforts allows researchers to expand this paradigm and investigate more nuanced relationships between sequence and function. Here, we present a simple tool (SWiLoDD: Sliding Window Localized Differentiation Detection) that allows researchers to analyze local, rather than site-specific, conservation using a sliding window approach. Our tool accepts multiple sequence alignments partitioned based on a biological differentiator and returns alignment position-based, localized differential enrichment metrics for amino acids of choice. We present two case studies of this analysis in action: local-but-diffuse glycine enrichments in the ATPase subunits of thermophilic and psychrophilic bacterial gyrase homologs, and ligand- and interface-specific amino acid enrichments in halophilic bacterial crotonyl-CoA carboxylases/reductases. Though we have described examples of extremophilic bacterial proteins in this study, our tool may be used to investigate any set of homologous sequences from which sub-groups can be meaningfully partitioned. Our results suggest that investigating differential localized conservation in partitioned MSAs will expand our understanding of how sequence conservation and protein function are connected.
Read full abstract