Abstract
Informative phylogenetic analysis is dependent on the presence of curated and annotated sequences. This may be complemented by the simultaneous availability of empirical data pertaining to their in vivo function. Confounding sequences, with their similarity to more than one functional cluster, can therefore, render any categorization ambiguous, subjective, and imprecise. Here, I analyze and discuss the development of a mathematical expression that can characterize a potential confounding protein sequence. Specifically, statistical descriptors of combinatorially arranged profile HMM scores are computed and evaluated. The resultant data is then incorporated into an index of sequence suitability. The sequence may then be recommended as either suitable for inclusion or be excluded all together. The index is independent of experimental data and, can, be computed from the primary structure of the protein sequence. This can be utilized to trim previously grouped sequences and can either finalize the composition of training set or reduce the search space of sequences to be tested.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.