BackgroundPublic health science has made considerable effort to understand the determinants of health. Although substantial gains have been made in understanding the determinants of population health, our ability to translate discoveries at the population level towards discriminating between cases and non-cases of disease at the individual level has been limited despite increasing availability of data. This study draws from the recent advances in machine learning approaches to explore whether such methods can revolutionise how we build predictive models of health using social survey data. MethodsData from the Understanding Society survey (wave 2 [2010–12], 6830 individuals who took part in all aspects of data collection and for whom all data were included) were used to measure five types of data: personal (eg, age, sex), social (eg, occupation, education), health (eg, body weight, grip strength), biomarker (eg, cholesterol, hormones), and genetic. Outcome variables were presence of a limiting long-term illness, and type of illness or disability (eg, hypertension) 1 and 5 years from baseline (both overall status and predicting only new cases). Variable reduction was applied on the explanatory measures (∼200) within data type using LASSO regression. Deep learning via neural networks (using k-fold cross validation) was used to build predictive models on training data (75% of total sample). Model evaluation was performed on test data (25%) and compared several model fit statistics (eg, accuracy, sensitivity, specificity). Model fit was compared with simpler logistic regression models. FindingsHealth data had the strongest prediction of future health status (test data accuracy 71%), with personal data (61%) the poorest performing data type. Within the health data, physical activity and presence of some health conditions were strong individual predictors. Models only allowed for shallow learning of data, with more complex models adding little or reducing performance. However, the models only offered marginal improvements (∼1–2% accuracy improvements) compared with logistic regression models. InterpretationThe project makes two main contributions to public health science: the evaluation of different data types and their relative contributions as predictors of health status; and exploring the potential of machine learning to improve predictive models of ill health. FundingUnderstanding Society Biomedical Data Fellowship Programme. The funder had no role in the research.