Combining Subgroup Discovery and Clustering to Identify Diverse Subpopulations in Cohort Study Data

Uli Niemann,Bernhard Preim,Henry Volzke,Till Ittermann,Myra Spiliopoulou

doi:10.1109/cbms.2017.15

Abstract

Subgroup discovery (SD) exploits its full value in applications where the goal is to generate understandable models. Epidemiologists search for statistically significant relationships between risk factors and outcome in large and heterogeneous datasets encompassing information about the participants health status gathered from questionnaires, medical examinations and image acquisition. SD algorithms can help epidemiologists by automatically detecting such relationships presented as comprehensible rules, aiming to ultimately improve prevention, diagnosis and treatment of diseases. However, SD algorithms often produce large and overlapping rule sets requiring the expert to conduct a manual post-filtering step that is time-consuming and tedious. In this work, we propose a clustering-based algorithm that hierarchically reorganizes rule sets and summarizes all important concepts while maintaining diversity between the rule clusters. For each cluster, a representative rule is selected and then displayed to the expert who in turn can drill-down to other cluster members. We evaluate our algorithm on two cohort study datasets where the diseases hepatic steatosis and goiter serve as target variable, respectively. We report on our findings with respect to effectiveness of our algorithm and we present selected subpopulations.

Full Text