Abstract

How much information does a dataset contain about an outcome of interest? To answer this question, estimates are generated for a given dataset, representing the minimum possible absolute prediction error for an outcome variable that any model could achieve. The estimate is produced using a constrained omniscient model that mandates only that identical observations receive identical predictions, and that observations which are very similar to each other receive predictions that are alike. It is demonstrated that the resulting prediction accuracy bounds function effectively on both simulated data and real-world datasets. This method generates bounds on predictive performance typically within 10% of the performance of the true model, and performs well across a range of simulated and real datasets. Three applications of the methodology are discussed: measuring data quality, model evaluation, and quantifying the amount of irreducible error in a prediction problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.