Abstract

In longitudinal clinical studies, methodologies available for the analysis of multivariate data with multivariate methods are relatively limited. Here, we present Consensus Clustering (CClust) a new computational method based on clustering of time profiles and posterior identification of correlation between clusters and predictors. Subjects are first clustered in groups according to a response variable temporal profile, using a robust consensus-based strategy. To discover which of the remaining variables are associated with the resulting groups, a non-parametric hypothesis test is performed between groups at every time point, and then the results are aggregated according to the Fisher method. Our approach is tested through its application to the EarlyBird cohort database, which contains temporal variations of clinical, metabolic, and anthropometric profiles in a population of 150 children followed-up annually from age 5 to age 16. Our results show that our consensus-based method is able to overcome the problem of the approach-dependent results produced by current clustering algorithms, producing groups defined according to Insulin Resistance (IR) and biological age (Tanner Score). Moreover, it provides meaningful biological results confirmed by hypothesis testing with most of the main clinical variables. These results position CClust as a valid alternative for the analysis of multivariate longitudinal data.

Highlights

  • More than a third of children in the UK are overweight or obese[1] and the increasing worldwide prevalence of obesity and type 2 diabetes (T2D) in children is a serious public health concern

  • The Anthropometric and clinical dataset includes repeated measurements of a panel of anthropometric and clinical variables for 149 subjects, namely body weight, body mass index, body composition data generated by dual-energy x-ray absorptiometry (DEXA), skinfold thickness, actigraphy, resting energy expenditure, and pubertal Tanner scores, fasting glucose and insulin

  • Each dataset has a number of missing measurements: rather than imputing the missing values, we decided to restrict our analysis to subjects with complete time series

Read more

Summary

Introduction

More than a third of children in the UK are overweight or obese[1] and the increasing worldwide prevalence of obesity and type 2 diabetes (T2D) in children is a serious public health concern. Since the development of T2D can be delayed or prevented by lifestyle and medical interventions, there is increasing awareness that early identification of children with susceptibility to diabetes is critical[3] It is important, to define the influence of childhood developmental stages on adiposity, IR and associated metabolic parameters. We addresses the methodological challenge of integrating and correlating the temporal variations of many different data types in the EarlyBird cohort from age 5 to age 16, including anthropometric, clinical and serum metabonomic data. Non-parametric or semi-parametric statistical models are widely employed to model complex curves of longitudinal trajectories[7] These techniques are designed to handle a single dataset generated over time. The use of the consensus between cluster compositions as the final valid clustering provides the name for our approach (Consensus Clustering)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call