Data-driven prediction for battery state estimation provides a simple alternative to standard electrical and/or electrochemical models. Recent efforts have used cycling data from laboratory measurements to predict the state of health (SOH) of a cell over its lifetime [1-5]. Laboratory measurements attempt to capture the expected conditions during aging, however, due to the complex situations experienced by electric vehicles during aging, it is not possible to represent each aging path in a laboratory experiment. Additionally, due to resource and time limitations, measurements extending beyond one year are rare. In this work, a global data-set of over 15,000 readings from 300 vehicles driven under real-world conditions is used to provide context on vehicle aging and as a training set for a data-driven SOH prediction model. This data differs significantly from traditional laboratory data in that it does not contain the time-series sampling of, for example, current and voltage over time, as used in most models- rather averaged data over several days, and sometimes months. At relatively frequent intervals during the life of the vehicle, variables such as state of charge, temperature, kilometers driven and others, are compiled as an average over the measurement interval, resulting in one value per measurement period. These averaged readings offer a snapshot into the conditions the vehicle experienced during the measurement period, but do not offer the same detail as what would be recorded in laboratory settings. From this data, a novel data-driven SOH prediction model is developed. The SOH prediction is compared against the onboard SOH estimation performed in the vehicle, which itself has been validated by the vehicle manufacturers. With a large number of samples, a data-driven approach to predicting the capacity of a vehicle is applied. Using Keras, a neural network API written for Python, a two-layer neural network was cross-validated with 50 % training data and 50 % testing data, achieving a high prediction accuracy using six input parameters. This method has the advantage over traditional SOH prediction models, such as semi-empirical and equivalent circuit models, in that it is able to predict the SOH of vehicles in dynamic aging conditions over many years and does not require time-series values of voltage and current as inputs, or cell parameterization, which are often inaccurate and incur high costs to measure. Barré, F. Saurd, M. Gérard, M. Montaru, D. Riu, Statistical Analysis for understanding and prediction battery degradations in real-life electric vehicle use. Journal of Power Sources 245 (2014) 846-856.Zou, X. Hu, H. Ma, S. Li, Combined State of Charge and State of Health estimation over lithium-ion battery cell cycle lifespan for electrical vehicles. Journal of Power Sources 273 (2015) 793-803You, S. Park, D. Oh, Real-time state-of-health estimation for electric vehicle batteries: A data-driven approach. Applied Energy 176 (2016) 92-103.Li, C. Zou, M. Berecibar, E. Nanini-Maury, J. C.-W. Chan, P. van den Bossche, J. Van Mierlo, N. Omar, Random forest regression for online capacity estimation of lithium-ion batteries. Applied Energy 232 (2018) 197-210.Richardson, M. A. Osborne, D. A. Howey, Gaussian process regression for forecasting battery state of health. Journal of Power Sources 357 (2017) 208-219. Figure 1