Today, computational tools for the prediction of the metabolite structures of xenobiotics are widely available and employed in small-molecule research. Reflecting the availability of measured data, these in silico tools are trained and validated primarily on drug metabolism data.In this work, we assessed the capacity of five leading metabolite structure predictors to represent the metabolism of agrochemicals observed in rats. More specifically, we tested the ability of SyGMa, GLORY, GLORYx, BioTransformer 3.0, and MetaTrans to correctly predict and rank the experimentally observed metabolites of a set of 85 parent compounds. We found that the models were able to recover about one to two-thirds of the experimentally observed first-generation, second-generation and third-generation metabolites, confirming their value in applications such as metabolite identification. However, precision was low for all investigated tools and did not exceed approximately 18 % for the pool of first-generation metabolites and 2 % for the pool of compounds representing the first three generations of metabolites. The variance in prediction success rates was high across the individual metabolic maps, meaning that outcomes depend strongly on the specific compound under investigation. We also found that the predictions for individual parent compounds differed strongly between the tools, particularly between those built on orthogonal technologies (e.g., rule-based and end-to-end machine learning approaches). This renders ensemble model strategies promising for improving success rates. Overall, the results of this benchmark study show that there is still considerable room for the improvement of metabolite structure predictors left. Our discussion points out several avenues to progress. The bottleneck in method development certainly has been, and will remain, for the foreseeable future, the limited quantity and quality of available measured data on small-molecule metabolism.
Read full abstract