Mathematical basis of improved protein subfamily classification by a HMM-based sequence filter

Siddhartha Kundu

doi:10.1016/j.mbs.2017.09.001

Abstract

Informative phylogenetic analysis is dependent on the presence of curated and annotated sequences. This may be complemented by the simultaneous availability of empirical data pertaining to their in vivo function. Confounding sequences, with their similarity to more than one functional cluster, can therefore, render any categorization ambiguous, subjective, and imprecise. Here, I analyze and discuss the development of a mathematical expression that can characterize a potential confounding protein sequence. Specifically, statistical descriptors of combinatorially arranged profile HMM scores are computed and evaluated. The resultant data is then incorporated into an index of sequence suitability. The sequence may then be recommended as either suitable for inclusion or be excluded all together. The index is independent of experimental data and, can, be computed from the primary structure of the protein sequence. This can be utilized to trim previously grouped sequences and can either finalize the composition of training set or reduce the search space of sequences to be tested.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mathematical basis of improved protein subfamily classification by a HMM-based sequence filter

Abstract

Talk to us

Similar Papers

More From: Mathematical Biosciences

Lead the way for us

Journal: Mathematical Biosciences	Publication Date: Sep 13, 2017
Citations: 3

Similar Papers

Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'
Qi Dai ... Tianming Wang
BMC Bioinformatics | VOL. 9
Qi Dai, et. al.Qi Dai ... Tianming Wang
23 Sep 2008
BMC Bioinformatics | VOL. 9

Editor's evaluation: Future COVID19 surges prediction based on SARS-CoV-2 mutations surveillance
Jameel Iqbal
-
Jameel IqbalJameel Iqbal
31 Oct 2022
31 Oct 2022

A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences.
Johannes Linder ... Georg Seelig
Cell Systems | VOL. 11
Johannes Linder, et. al.Johannes Linder ... Georg Seelig
25 Jun 2020
Cell Systems | VOL. 11

Middle Pleistocene protein sequences from the rhinoceros genus Stephanorhinus and the phylogeny of extant and extinct Middle/Late Pleistocene Rhinocerotidae.
Frido Welker ... Geoff M Smith
PeerJ | VOL. 5
Frido Welker, et. al.Frido Welker ... Geoff M Smith
14 Mar 2017
PeerJ | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mathematical basis of improved protein subfamily classification by a HMM-based sequence filter

Abstract

Talk to us

Similar Papers

More From: Mathematical Biosciences