The Ames test is a gold standard mutagenicity assay that utilizes various Salmonella typhimurium strains with and without S9 fraction to provide insights into the mechanisms by which a chemical can mutate DNA. Multitask deep learning is an ideal framework for developing QSAR models with multiple end points, such as the Ames test, as the joint training of multiple predictive tasks may synergistically improve the prediction accuracy of each task. This work investigated how toxicology domain knowledge can be used to handcraft task groupings that better guide the training of multitask neural networks compared to a naïve ungrouped multitask neural network developed on a complete set of tasks. Sixteen S. typhimurium ± S9 strain tasks were used to generate groupings based on mutagenic and metabolic mechanisms that were reflected in correlation data analyses. Both grouped and ungrouped multitask neural networks predicted the 16 strain tasks with a higher balanced accuracy compared with single task controls, with grouped multitask neural networks consistently featuring incremental increases in predictivity over the ungrouped approach. We conclude that the main variable driving these performance improvements is the general multitask effect with mechanistic task groupings acting as an enhancement step to further concentrate synergistic training signals united by a common biological mechanism. This approach enables incorporation of toxicology domain knowledge into multitask QSAR model development allowing for more transparent and accurate Ames mutagenicity prediction.