Abstract
Associative Classification, a combination of two important and different fields (classification and association rule mining), aims at building accurate and interpretable classifiers by means of association rules. The process used to generate association rules is exponential by nature; thus in AC, researchers focused on the reduction of redundant rules via rules pruning and rules ranking techniques. These techniques take an important part in improving the efficiency; however, pruning may negatively affect the accuracy by pruning interesting rules. Further, these techniques are time consuming in term of processing and also require domain specific knowledge to decide upon the selection of the best ranking and pruning strategy. In order to overcome these limitations, in this research, an automata based solution is proposed to improve the classifier’s accuracy while replacing ranking and pruning. A new merging concept is introduced which used structure based similarity to merge the association rules. The merging not only help to reduce the classifier size but also minimize the loss of information by avoiding the pruning. The extensive experiments showed that the proposed algorithm is efficient than AC, Naive Bayesian, and Rule and Tree based classifiers in term of accuracy, space, and speed. The merging takes the advantages of the repetition in the rules set and keep the classifier as small as possible.
Highlights
Classification considers to be one of the main pillars in DM and ML [1, 2]
A new storage structure is necessary that can help reducing the size of dataset in order to make the processing less time consuming while improving the accuracy. Keeping in mind these shortcomings, we propose to replace ranking and pruning with our automata based
Automata were utilized for two purposes: a) as a storage structure in classification; and b) to replace the rule pruning and rule ranking phases of associative classification
Summary
Classification considers to be one of the main pillars in DM and ML [1, 2]. It is a data analysis technique, used to categorize data into different classes based on some common characteristics or associations in the data. AC is based on ARM where, first, the strongest Class Association Rules (CAR) are discovered from dataset, followed by converting those rules into classifier model. Those stronger associations from the data, in the form of CAR, make the classifier more logical and improve accuracy. A new storage structure is necessary that can help reducing the size of dataset in order to make the processing less time consuming while improving the accuracy. Keeping in mind these shortcomings, we propose to replace ranking and pruning with our automata based.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have