Association rules mining with the Chinese social insurance fund dataset can effectively discover different kinds of errors, irregularities, and illegal acts by providing auditors with relationships among the items and therefore improve auditing quality and efficiency. However, traditional positive and negative association rules (PNARs) mining algorithms inevitably produce too many meaningless or contradictory rules when these two types of rules are mined simultaneously, which brings a huge challenge to auditors retrieving decision information. Aimed to reduce the quantity of low-reliability PNARs without missing interesting rules, this paper first proposes an improved PNARs mining algorithm with minimum correlation and triple confidence threshold to control the mined rules number by narrowing the range of confidence settings. Then, a novel pruning algorithm based on the inclusion relation of the rule’s antecedent and consequent is given to remove those redundant rules. After that, the proposed optimized PNARs mining approach is applied to the Chinese social insurance fund dataset starting with audit features influence factors mining using the Hash table. The experimental results with different datasets show that the proposed framework not only can ensure effective and interesting rules extraction but also has better performance than traditional approaches in both accuracy and efficiency, reducing the number of redundant PNARs by over 70.1% with experimental datasets and average 78.5% with auditing data mining, respectively.
Read full abstract