Energy performance certificate (EPC) databases are crucial for analysing building stocks and informing relevant policy interventions across Europe. Initially designed for building-to-building energy efficiency assessments, EPCs and resultant EPC databases, are now used to inter alia, evaluate stock energy efficiency, monitor impacts of national policies, and estimate financial viability of interventions. Consumers use EPCs when seeking financial incentives like “green” mortgages, while financial institutions use EPCs to consider renovation costs in mortgage assessments. The range of stakeholders using EPC data thus extends beyond those who are expert in building energy rating.Errors in manually inputting data from on-site surveys into calculations can undermine EPC results. Currently, no standardised method exists for validating EPC datasets. This research introduces the first automated, data-driven validation of an EPC dataset. It adapts as the database evolves. Seventeen unique filters were developed, revealing that 30% of EPC entries were erroneous and/or outlier data, with 80% related to misassessment of geometrical features. Errors in one field often correlated with errors in others, indicating some Assessors’ low responsibility towards data quality.The validation method, automated through Python and R language scripts, align the Irish EPC database taxonomy with Irish and European Building Stock Observatories, facilitating consistent future reporting. Recommendations are provided to enhance EPC data quality.