Abstract

In this work we propose the hypothesis that replacing the current system of representing the chemical entities known as amino acids using Latin letters with one of several possible alternative symbolic representations will bring significant benefits to the human construction, modification, and analysis of multiple protein sequence alignments. We propose ways in which this might be done without prescribing the choice of actual scripts used. Specifically we propose and explore three ways to encode amino acid texts using novel symbolic alphabets free from precedents. Primary orthographic encoding is the direct substitution of a new alphabet for the standard, Latin-based amino acid code. Secondary encoding imposes static residue groupings onto the orthography of the alphabet by manipulating the shape and/or orientation of amino acid symbols. Tertiary encoding renders each residue as a composite symbol; each such symbol thus representing several alternative amino acid groupings simultaneously. We also propose that the use of a new group-focussed alphabet will free the colouring of amino acid residues often used as a tool to facilitate the representation or construction of multiple alignments for other purposes, possibly to indicate dynamic properties of an alignment such as position-wise residue conservation.

Highlights

  • In this work we propose the hypothesis that replacing the current system of representing the chemical entities known as amino acids using Latin letters with one of several possible alternative symbolic representations will bring significant benefits to the human construction, modification, and analysis of multiple protein sequence alignments

  • The IUPAC nomenclature evolved from an Science long ago christened the standard twenty protein- original proposal formulated during the 1950s by Frantisek making - or biogenic - amino acids; their full, informal names Sorm (1913-1980)

  • Our suggested encodings offer three options: the direct substitution of essentially random symbols for the conventional one-letter code, building the similarity apparent in amino acid properties into the graphical similarity of a new set of amino acid symbols, and combining several orthographic elements to create new symbols that better capture the multidimensional nature of amino acid similarity

Read more

Summary

Introduction

In this work we propose the hypothesis that replacing the current system of representing the chemical entities known as amino acids using Latin letters with one of several possible alternative symbolic representations will bring significant benefits to the human construction, modification, and analysis of multiple protein sequence alignments. We propose and explore three ways to encode amino acid texts using novel symbolic alphabets free from precedents. Primary orthographic encoding is the direct substitution of a new alphabet for the standard, Latin-based amino acid code. We propose that the use of a new group-focussed alphabet will free the colouring of amino acid residues often used as a tool to facilitate the representation or construction of multiple alignments for other purposes, possibly to indicate dynamic properties of an alignment such as position-wise residue conservation. One-letter codes used universally in bioinformatics derive, in Science might have chosen a very different set of letters

Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call