In livestock production, animal-related data are often registered in specialised databases and are usually not interconnected, except for a common identifier. Analysis of combined datasets and the possible inclusion of third-party information can provide a more complete picture or reveal complex relationships. The aim of this study was to develop a risk index to predict farms with an increased likelihood for animal welfare violations, defined as non-compliance during on-farm welfare inspections. A data-driven approach was chosen for this purpose, focusing on the combination of existing Swiss government databases and registers. Individual animal-level data were aggregated at the herd level. Since data collection and availability were best for cattle and pigs, the focus was on these two livestock species. We present machine learning models that can be used as a tool to plan and optimise risk-based on-farm welfare inspections by proposing a consolidated list of priority holdings to be visited. The results of previous on-farm welfare inspections were used to calibrate a binary welfare index, which is the prediction goal. The risk index is based on proxy information, such as the participation in animal welfare programmes with structured housing and outdoor access, herd type and size, or animal movement data. Since transparency of the model is critical both for public acceptance of such a data-driven index and farm control planning, the Random Forest model, for which the decision process can be illustrated, was investigated in depth. Using historical inspection data with an overall low prevalence of violations of approximately 4% for both species, the developed index was able to predict violations with a sensitivity of 81.2 and 79.5% for cattle and pig farms, respectively. The study has shown that combining multiple and heterogeneous data sources improves the quality of the models. Furthermore, privacy-preserving methods are applied to a research environment to explore the available data before restricting the feature space to the most relevant. This study demonstrates that data-driven monitoring of livestock populations is already possible with the existing datasets and the models developed can be a useful tool to plan and conduct risk-based animal welfare inspection.