Subgroup Discovery in Data Sets with Multi–dimensional Responses: A Method and a Case Study in Traumatology

Lan Umek,Dragica Smrke,Annie Morin,Blaž Zupan,Jean-Hugues Chauchat,Marko Toplak,Gregor Makovec

doi:10.1007/978-3-642-02976-9_39

Abstract

Biomedical experimental data sets may often include many features both at input (description of cases, treatments, or experimental parameters) and output (outcome description). State-of-the-art data mining techniques can deal with such data, but would consider only one output feature at the time, disregarding any dependencies among them. In the paper, we propose the technique that can treat many output features simultaneously, aiming at finding subgroups of cases that are similar both in input and output space. The method is based on k-medoids clustering and analysis of contingency tables, and reports on case subgroups with significant dependency in input and output space. We have used this technique in explorative analysis of clinical data on femoral neck fractures. The subgroups discovered in our study were considered meaningful by the participating domain expert, and sparked a number of ideas for hypothesis to be further experimentally tested.Keywordssubgroup discoverymulti–label prediction k-medoids clustering χ 2 statisticsfemoral neck fracture

Full Text