Abstract

We introduce a novel interactive framework to handle both instance-level and temporal smoothness constraints for clustering large longitudinal data and for tracking the cluster evolutions over time. It consists of a constrained clustering algorithm, called CVQE+, which optimizes the clustering quality, constraint violation and the historical cost between consecutive data snapshots. At the center of our framework is a simple yet effective active learning technique, named Border, for iteratively selecting the most informative pairs of objects to query users about, and updating the clustering with new constraints. Those constraints are then propagated inside each data snapshot and between snapshots via two schemes, called constraint inheritance and constraint propagation, to further enhance the results. Moreover, a historical constraint is enforced between consecutive snapshots to ensure the consistency of results among them. Experiments show better or comparable clustering results than state-of-the-art techniques as well as high scalability for large datasets. Finally, we apply our algorithm for clustering phenotypes in patients with Obstructive Sleep Apnea as well as for tracking how these clusters evolve over time.

Highlights

  • In semi-supervised clustering, domain knowledge is typically encoded in the form of instance-level must-link and cannot-link constraints [11] for aiding the clustering process, enhancing the quality of results

  • We develop an efficient framework to cope with the above problems following the iterative active learning approach as in [17, 43]

  • We extend the work in [27] for investigating clinical clusters in a use case of patients with Obstructive Sleep Apnea (OSA) [22]

Read more

Summary

Introduction

In semi-supervised clustering, domain knowledge is typically encoded in the form of instance-level must-link and cannot-link constraints [11] for aiding the clustering process, enhancing the quality of results. Such constraints specify that two objects must be placed or must not be placed in the same clusters, respectively. Davidson et al [8] show that the clustering quality varies significantly using different equisize sets of constraints. Annotating constraints requires human intervention, an expensive and time consuming task that should be minimized as much as possible given the same expected clustering quality. How to choose a good and compact set of constraints rather than

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call