Abstract

Aging has pronounced effects on blood laboratory biomarkers used in the clinic. Prior studies have largely investigated one biomarker or population at a time, limiting a comprehensive view of biomarker variation and aging across different populations. Here we develop a supervised machine learning approach to study aging using 356 blood biomarkers measured in 67,563 individuals across diverse populations. Our model predicts age with a mean absolute error (MAE), or average magnitude of prediction errors, in held-out data of 4.76 years and an R2 value of 0.92. Age prediction was highly accurate for the pediatric cohort (MAE = 0.87, R2 = 0.94) but inaccurate for ages 65+ (MAE = 4.30, R2 = 0.25). Variability was observed in which biomarkers carry predictive power across age groups, genders, and race/ethnicity groups, and novel candidate biomarkers of aging were identified for specific age ranges (e.g. Vitamin E, ages 18-44). We show that predictors for one age group may fail to generalize to other groups and investigate non-linearity in biomarkers near adulthood. As populations worldwide undergo major demographic changes, it is increasingly important to catalogue biomarker variation across age groups and discover new biomarkers to distinguish chronological and biological aging.

Highlights

  • Aging has pronounced effects on blood laboratory biomarkers used in the clinic such as testosterone [1] and plasma fibrinogen [2]

  • While machine learning has been widely applied to fields such as medical imaging [14, 15] and speech recognition [16, 17], it is comparatively underapplied in the study of blood laboratory biomarkers [18, 19], which may be among the cheapest to measure in individuals

  • We trained a random forest model [21] to predict chronological age using data from 67,563 individuals ranging in age from 1 to 85 years from nine Centers for Disease Control and Prevention (CDC) National Health and Nutrition Examination Survey (NHANES) cohorts spanning 1999-2016 (Figure 1), a representative sample of the non-institutionalized population of the www.aging-us.com

Read more

Summary

Introduction

Aging has pronounced effects on blood laboratory biomarkers used in the clinic such as testosterone [1] and plasma fibrinogen [2]. Studies of laboratory analytes and aging have traditionally considered a single analyte at a time [5,6,7] and have been limited in their inclusion of demographically diverse groups [8]. Modeling many blood biomarkers together across population groups paints a more complete picture of health and disease and enables the systematic study of differences resulting from the definitions of age based on time since birth (“chronological age”) and as a cumulative measure of biological wear and tear (“biological age”) [9]. Machine learning and statistical methods have enabled agnostic, data-driven approaches to age prediction based on methylation [10, 11], transcriptomic [12], and retinal imaging data [13]. While machine learning has been widely applied to fields such as medical imaging [14, 15] and speech recognition [16, 17], it is comparatively underapplied in the study of blood laboratory biomarkers [18, 19], which may be among the cheapest to measure in individuals

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call