Abstract

AbstractContrast pattern mining, which finds patterns describing differences between two classes of data, is an important task in various scenarios. As real-world data is usually a mixture of nominal and numerical attributes (e.g., electronic medical records), contrast pattern mining algorithms over nominal-numerical mixed data are in great demand. Existing algorithms on contrast pattern mining either can only handle a single type of attribute or transform numerical attributes into nominal attributes with prior knowledge. However, these algorithms may result in limited discrimination of contrast patterns due to the failure to exploit the original data information and inflexible pattern forms. In this paper, we propose a novel algorithm, CHPMiner, which mines a new kind of contrast pattern called contrast hybrid pattern (CHP) that contains nominal attributes and numerical relationships among numerical attributes based on extended gene expression programming (GEP). Specifically, CHPMiner develops two sub-expressions and a novel structure to combine nominal and numerical attributes. Moreover, CHPMiner leverages a specific fitness function to guide the evolution direction for mining CHPs that are highly discriminating. Experiments on four real-world datasets show that CHPMiner outperforms baselines. The case study further demonstrates the effectiveness of CHPMiner.KeywordsContrast pattern miningContrast hybrid patternGene expression programming

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call