Background Advances in statistical modelling and machine learning approaches, which can be deployed locally using open source programming languages, represent a unique opportunity to improve workflows and lower costs in health care across the globe through the creation of in silico biomarkers. The goal of this study was to extract meaningful data from the publicly available Prematurity and Respiratory Outcomes Program (PROP) trial data that could help generate useful clinical diagnostic aids with minimal cost for deployment in global healthcare settings. Methods A cluster analysis of the PROP dataset was conducted. We generated a simple model using an open-source software platform that generates a growth prediction of patients born less than 30 weeks. We then obtained validation data from a Uruguayan hospital to test the capacity for deployment of the models. Results Analysis revealed two main clusters of patients in the trial, with differentiation mainly based on the clinical and anthropomorphic measurements of birth gestational age, birth weight, and head circumference. The anthropometric measurements of daily weight, birth weight, head circumference, and birth gestational age were highly correlated with respiratory dysfunction and co-morbidities We note that deviation from this predicted growth curve in PROP patients was associated with culture-proven sepsis, and may represent a more sensitive anthropomorphic biomarker than the weight percentile systems routinely used globally such as Fenton curves. We found that early deviation from our projected growth model was highly associated with patient fatality. However, over long-term predictions, models trained on PROP clinical trial patients showed significantly more error in the Uruguayan patients. Conclusions Although these prediction models built upon PROP data were not generalizable to Uruguayan patients, our data suggest that prediction models using simple anthropomorphic measurements, if trained on local patients, may be able to provide value as a low-cost in silico biomarker. We concluded that local investment in clinical informatics infrastructure is needed to train models based on locally extracted clinical data.