Data-driven turbulence modelling is becoming common practice in the field of fluid mechanics. Complex machine learning methods are applied to large high fidelity data sets in an attempt to discover relationships between mean flow features and turbulence model parameters. However, a clear discrepancy is emerging between complex models that appear to fit the high fidelity data well a priori and simpler models which subsequently hold up in a posteriori testing through CFD simulations. With this in mind, a novel error quantification technique is proposed consisting of an upper and lower bound, against which data-driven turbulence models can be systematically assessed. At the lower bound is models that are linear in either the full set or a subset of the input features, where feature selection is used to determine the best model. Any machine learning technique must be able to improve on this performance for the extra complexity in training to be of practical use. The upper bound is found by the stable insertion of the high fidelity data for the Reynolds stresses into CFD simulation. Three machine learning methods, Gene Expression Programming, Deep Neural Networks and Gaussian Mixtures Models are presented and assessed on this error quantification technique. We further show that for the simple canonical cases often used to develop data-driven methods, lower bound linear models can provide very satisfactory accuracy and stability with limited scope for substantial improvement through more complex machine learning methods.