Abstract

Background: Clustering, a class of unsupervised machine learning methods, has been applied to physical activity data recorded by accelerometers to discover unique patterns of physical activity and health outcomes. The prediction strength metric provides a criterion to determine the optimal number of clusters for clustering methods. The aim of this study is to provide specific guidance for applying prediction strength to time series accelerometer data. Methods: For this purpose, we designed an extensive simulation study. We created a synthetic data set of accelerometer data using data from a childhood obesity management trial. We evaluated the role of a prespecified threshold of the prediction strength metric as a key input parameter. We compared the recommended threshold (between 0.8 and 0.9) with an approach we developed (Local Maxima). Results: The choice of threshold had a large impact on performance. When the noise level increased (greater overlap between true clusters), lower thresholds outperformed the recommended threshold, which tended to underestimate the true number of clusters. In addition, we found that sorting the data by magnitude of intensity in windows within the time series of interest prior to clustering alleviated sensitivity to threshold choice. Furthermore, for accelerometer data, we recommend that the Local Maxima approach be utilized together with a graphical evaluation of the prediction strength metric function over values of k. Finally, we strongly suggest sorting of the data prior to clustering if sorting retains meaning for the research question at hand. Conclusion: Our recommendations can help future researchers discover more robust patterns from accelerometer data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call