EFFICIENT MINING OF CLOSED TREE PATTERNS FROM LARGE TREE DATABASES WITH SUBTREE CONSTRAINT

Viet Anh Nguyen,Koichiro Doi,Akihiro Yamamoto

doi:10.1142/s0218213012500261

Abstract

Mining frequent tree patterns from tree databases has practical importance in domains like Web mining, Bioinformatics, and so on. Although there have been algorithms on efficient tree mining, these algorithms are often lack of the interpretability in that they often produce a huge number of patterns, most of which are meaningless to users. This paper aims at both demands, one with respect to computational cost, which is efficient generation of tree patterns, and another one with respect to the interpretability. This task requires an efficient method to incorporate the users' needs into mining process. We propose a new top-down method for mining unordered closed tree patterns from a database of trees such that every mined pattern must contain a common piece of information in the form of a tree specified by the user. This type of mining is called mining with subtree constraint which would be useful, for example, inWeb mining and Bioinformatics, where users want to extract common patterns around some given information from original data. The proposed algorithm is tested and compared with a state-of-the-art tree mining algorithm on real and artificial datasets with very good results.

Full Text