I propose an approach to enrich administrative data with information only available in survey data using machine learning techniques. To illustrate the approach, I replicate a prominent study that used survey data to analyze the federal minimum wage introduction in Germany. In contrast to the original study, I use the universe of German establishments rather than the limited number of establishments that participated in the survey. As the administrative data do not contain information on whether establishments were treated by the minimum wage, I use a random forest classifier, trained on survey data, to predict the treatment status of establishments. The results obtained using the administrative data are qualitatively similar to the results obtained using the survey data. Beyond replication of previous research, this approach broadens the research potential of administrative data, enabling researchers to explore more detailed research questions at scale.
Read full abstract