Abstract

Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly.Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from ‘first passage probability distribution’ to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach.Contact: d.r.flower@aston.ac.uk

Highlights

  • Determining the similarity between macromolecules is central to bioinformatics

  • Most approaches to protein sequence similarity use models of sequence evolution and compare amino-acid strings, searching for linear conservation of sequence

  • We develop and analyze propensity data by modeling sequences as a time-series, estimating the scaling regime of a generalized probability density function (PDF) of a variable derived from the original propensity data structure

Read more

Summary

Introduction

Determining the similarity between macromolecules is central to bioinformatics. While comparison of 3-dimensional macromolecular structures remains an active area, most work focuses on macromolecular sequences. Methods based on plotting so-called propensity scales (Nakai et al, 1988) have enjoyed long-standing popularity; with scales mirroring one or more amino acid properties, such as hydrophobicity (Hopp and Woods, 1981) or electronegativity Such scales abound: AAindex has collected 545 different published scales (Kawashima et al, 2008). Our protocol enables us to abstract key features from propensity plots while remaining free of any text-based alignment scheme We apply this alignmentindependent approach to the analysis of protein sequences, evaluating it as a potential means of automatically characterizing and clustering large numbers of sequences

Persistence Analysis
Methods
Plotting of propensity data
Propensity analysis
Application to test cases
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call