Abstract

Linear models are some of the most straightforward and commonly used modelling approaches. Consider modelling approximately monotonic response data arising from a time-related process. If one has knowledge as to when the process began or ended, then one may be able to leverage additional assumed data to reduce prediction error. This assumed data, referred to as the "anchor," is treated as an additional data-point generated at either the beginning or end of the process. The response value of the anchor is equal to an intelligently selected value of the response (such as the upper bound, lower bound, or 99th percentile of the response, as appropriate). The anchor reduces the variance of prediction at the cost of a possible increase in prediction bias, resulting in a potentially reduced overall mean-square prediction error. This can be extremely effective when few individual data-points are available, allowing one to make linear predictions using as little as a single observed data-point. We develop the mathematics showing the conditions under which an anchor can improve predictions, and also demonstrate using this approach to reduce prediction error when modelling the disease progression of patients with amyotrophic lateral sclerosis.

Highlights

  • Prediction has always been an important part of statistical modeling

  • Our validation method is as follows: we will compare the standard model versus the anchor model by comparing their ability to predict each patient k′s first ALSFRS-R score after 365 days (1 year), observed at time xk,0, using only ALSFRS-R scores measured before 92 days (3 months)

  • We discussed a simple and computationally inexpensive technique that may improve the predictive power in linear models

Read more

Summary

Introduction

With the advent of big data and the rise of machine learning, one may think that researchers have moved beyond prediction via simple linear models This is not the case, especially in the field of medical research: a quick search of PubMed results in over 1000 publications which utilize linear (but not generalized linear) models from January 2016 – July 2017. The progression of the ALSFRS-R tends to be very linear [3, 4], but because of its bounded nature, simple linear models have the inherent structural defect of creating predictions that violate these lower and upper bounds Many adjustments to this problem exist: examples include truncating the prediction to 48 if the prediction is too large (0 if too small) [5] or performing a logistic transform on the data [6]. It is interesting to point out that this transformation has no impact on the OLS estimators for σ2

Utilizing an Anchor Reduces Predictive Variability
Predictive bias caused by utilizing an anchor
2.3: Using an Anchor to Reduce the Mean Square Predictive Error
Application to ALS Prediction
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.