Abstract

It is of long-standing interest to probe protein primary structures by their codes alone with minimal assumptions and model parameters. Approaches over decades have looked to the mathematics of digital information, most prominently alignment and database search algorithms. We follow an alternative line of inquiry by directing the Burrows-Wheeler transform (BWT) to archetypal sequences and sets. The motivation overlaps with bioinformatics and pharmaceutical chemistry: to better comprehend protein structure information with applications to drug and antibody targets. The approach, however, does not concentrate on sequences per se, but rather their information-conserved transforms. We demonstrate how such transforms enable obscure primary structure correlations to rise to the surface. The methodology further leverages the assembly information in sequences to provide class databases for comparisons. The databases illuminate additional correlations that are nuanced and characteristic. We illustrate the workings of BWT followed by data for archetypal drug and antibody targets. Our purpose is to establish new metrics and signatures for screening targets that complement ones from bioinformatics and protein structure modeling. The programming and analysis are straightforward and well-accessible to data researchers. Further, the software is freely available from the authors on request.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call