Abstract

We investigate a modification of the classical fixed effects panel data model (a linear regression model able to represent unobserved heterogeneity in the data), in which one has the additional possibility of controlling the conditional variance of the output given the input, by varying the cost associated with the supervision of each training example. Assuming an upper bound on the total supervision cost, we analyze and optimize the trade-off between the sample size and the precision of supervision (the reciprocal of the conditional variance of the output), by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the classical algorithm used to estimate the parameters of the fixed effects panel data model. Considering a specific functional form for that precision, we prove that, depending on the “returns to scale” of the precision with respect to the supervision cost per example, in some cases “many but bad” examples provide a smaller generalization error than “few but good” ones, whereas in other cases the opposite occurs. The results extend to the fixed effects panel data model the ones we obtained in recent works for a simpler linear regression model. We conclude discussing possible applications of our results, and extensions of the proposed optimization framework to other panel data models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call