Abstract

Bioinformatics has contributed to a different form of datasets called as high dimensional datasets. The high dimensional datasets are characterized by a large number of features and a small number of samples. The traditional algorithms expend most of the running time in mining large number of small and mid-size items which does not enclose valuable and significant information. The recent research focused on mining large cardinality itemsets called as colossal itemsets which are significant to many applications, especially in the field of bioinformatics. The existing frequent colossal itemset mining algorithms are unsuccessful in discovering complete set of significant frequent colossal itemsets. The mined colossal itemsets from existing algorithms provide erroneous support information which affects association analysis. Mining significant frequent colossal itemsets with accurate support information helps in attaining a high-level accuracy of association analysis. The proposed work highlights a novel pre-processing technique and bottom-up row enumeration algorithm to mine significant frequent colossal itemsets with accurate support information. A novel pre-processing technique efficiently utilizes minimum support threshold and minimum cardinality threshold to prune irrelevant samples and features. The experiment results demonstrate that the proposed algorithm has high accuracy over existing algorithms. Performance study indicates the efficiency of the pre-processing technique.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call