Virtual control groups (VCGs) in nonclinical toxicity represent the concept of using appropriate historical control data for replacing concurrent control group animals. Historical control data collected from standardized studies can serve as base for constructing VCGs and legacy study reports can be used as a benchmark to evaluate the VCG performance. Replacing concurrent controls of legacy studies with VCGs should ideally reproduce the results of these studies. Based on three four-week rat oral toxicity legacy studies with varying degrees of toxicity findings we developed a concept to evaluate VCG performance on different levels: the ability of VCGs to (i) reproduce statistically significant deviations from the concurrent control, (ii) reproduce test substance-related effects, and (iii) reproduce the conclusion of the toxicity study in terms of threshold dose, target organs, toxicological biomarkers (clinical pathology) and reversibility. Although VCGs have shown a low to moderate ability to reproduce statistical results, the general study conclusions remained unchanged. Our results provide a first indication that carefully selected historical control data can be used to replace concurrent control without impairing the general study conclusion. Additionally, the developed procedures and workflows lay the foundation for the future validation of virtual controls for a use in regulatory toxicology.