In this work we present an Ant Colony Optimization heuristic to find subgroups with exceptional behavior in time-to-event data. The area of time-to-event or survival data analysis has its basis in statistics, where the main goal is to predict if and when an event will happen. In other words, the main goal in survival analysis has long been to build global models able to predict the time for the occurrence of an event. Nevertheless, very often predictive models are used to compare stratified data in order to evaluate whether a variable is associated or not with the outcome. For instance, patients might be stratified according to a treatment variable (placebo or not) to compare models (survival curves) and decide on the effectiveness of the treatment. Although this is an effective approach if the variable of interest is already known, it does not provide an alternative for the cases where specialists do not know how to stratify the data, that is, if they do not know which variable could be related to the outcome. Our approach targets exactly this. Our method seeks combinations of variables that are associated, i.e. describe, subgroups of individuals with unexpected or exceptional survival curves. In this sense, we complement the literature with a descriptive approach that is able to find and characterize those groups for specialists. Our method is based on the framework of exceptional model mining. It improves on a preliminary version presented in a conference. The main enhancement was to redesign our heuristic to retrieve interesting and diverse subgroups while minimizing three aspects of redundancy: coverage; description; and model. Our second extension regards how the quality function is applied. We now allow users to control whether the quality measure compares subgroups against the population, or against individuals that do not satisfy the descriptive rule. Third, we conduct further experiments to compare the performance of our approach to state of the art algorithms with real world benchmark data sets. Finally, we also present a case study showing a possible application of our method in the bioinformatics/health domain.
Read full abstract