Accelerometer cut points misclassify important behaviors. In addition, the intensity paradigm is not helpful for promotion of specific behaviors for public health. Advances in computational techniques to classify accelerometer data into behaviors have been limited by training and testing within controlled observational settings. Previous studies have shown that accelerometers may not work well in older or obese populations, but these studies were limited to a single accelerometer feature, counts per minute. PURPOSE: To compare machine learned algorithms from accelerometer features developed across multiple days of data to estimate minutes of sitting, standing, walking, and biking. METHODS: We collected 268,325 minutes (523 days) of hip worn accelerometer and GPS data in a sample of scripted activities and in two free living adult cohorts with annotated person worn image data as the ground truth. One cohort was cyclists (N=40, 70% male, mean age 36, mean BMI 23.4). The other cohort was overweight and obese women (N=36, mean age 55, mean BMI 32.0). A Random Forest technique and Hidden Markov Model smoothing were employed to predict minute level behaviors from over 40 accelerometer features. Leave one out cross validation was applied. RESULTS: The algorithm, trained and tested on the scripted activities, performed with a mean accuracy of 92.7%. When applied to the cyclists, it performed with 70.9% accuracy. The algorithm, trained and tested on the cyclists, performed with a mean accuracy of 91.3%. The algorithm, trained on the cyclists and applied to the overweight and obese women, performed with 73.4% accuracy. The algorithm, trained and tested on the women, performed with a mean accuracy of 87.2%. The standard intensity cut points performed with 36.5% accuracy for walking, 8.7% accuracy for bicycling, and 77.2% for sedentary behaviors. The accelerometer features varied by age and obesity status. CONCLUSION: Standard accelerometer cut points greatly misclassify key behaviors such as walking and bicycling. Algorithms developed in a controlled observational setting do not accurately predict free living behavior over multiple days. Algorithms developed on one population may not apply to another group with different demographic and health characteristics. Supported by grant # U54 CA155435
Read full abstract