First-principles computations reliably predict the energetics of point defects in semiconductors but are constrained by the expense of using large supercells and advanced levels of theory. Machine learning models trained on computational data, especially ones that sufficiently encode defect coordination environments, can be used to accelerate defect predictions. Here, we develop a framework for the prediction and screening of native defects and functional impurities in a chemical space of group IV, III–V, and II–VI zinc blende semiconductors, powered by crystal Graph-based Neural Networks (GNNs) trained on high-throughput density functional theory (DFT) data. Using an innovative approach of sampling partially optimized defect configurations from DFT calculations, we generate one of the largest computational defect datasets to date, containing many types of vacancies, self-interstitials, anti-site substitutions, impurity interstitials and substitutions, as well as some defect complexes. We applied three types of established GNN techniques, namely crystal graph convolutional neural network, materials graph network, and Atomistic Line Graph Neural Network (ALIGNN), to rigorously train models for predicting defect formation energy (DFE) in multiple charge states and chemical potential conditions. We find that ALIGNN yields the best DFE predictions with root mean square errors around 0.3 eV, which represents a prediction accuracy of 98% given the range of values within the dataset, improving significantly on the state-of-the-art. We further show that GNN-based defective structure optimization can take us close to DFT-optimized geometries at a fraction of the cost of full DFT. The current models are based on the semi-local generalized gradient approximation-Perdew–Burke–Ernzerhof (PBE) functional but are highly promising because of the correlation of computed energetics and defect levels with higher levels of theory and experimental data, the accuracy and necessity of discovering novel metastable and low energy defect structures at the PBE level of theory before advanced methods could be applied, and the ability to train multi-fidelity models in the future with new data from non-local functionals. The DFT-GNN models enable prediction and screening across thousands of hypothetical defects based on both unoptimized and partially optimized defective structures, helping identify electronically active defects in technologically important semiconductors.