Abstract

Glycans, or carbohydrate sugar chains, which play a number of important roles in the development and functioning of multicellular organisms, can be regarded as labeled ordered trees. A recent increase in the documentation of glycan structures, especially in the form of database curation, has made mining glycans important for the understanding of living cells. We propose a probabilistic model for mining labeled ordered trees, and we further present an efficient learning algorithm for this model, based on an EM algorithm. The time and space complexities of this algorithm are rather favorable, falling within the practical limits set by a variety of existing probabilistic models, including stochastic context-free grammars. Experimental results have shown that, in a supervised problem setting, the proposed method outperformed five other competing methods by a statistically significant factor in all cases. We further applied the proposed method to aligning multiple glycan trees, and we detected biologically significant common subtrees in these alignments where the trees are automatically classified into subtypes already known in glycobiology.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.