Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for antibody specificity prediction.

Philippe A Robert,Aurél Prósz,Dag Trygve Tryslew Haug,Rahmad Akbar,Michael Widrich,Geir Kjetil Sandve,Victor Greiff,Ingrid Hobæk Haff,Milena Pavlović,Ingvild Frøberg Mathisen,Lonneke Scheffer,Alex Olar,Enkelejda Miho,Sepp Hochreiter,Günter Klambauer,Igor Snapkov,Maria Chernigovskaya,Andrei Slabodkin,Eva Smorodina,Mai Ha Vu,Robert Frank,Puneet Rawat,Brij Bhushan Mehta,Fridtjof Lund‐Johansen ,Krzysztof Jan Abram

doi:10.1038/s43588-022-00372-4

Abstract

Machine learning (ML) is a key technology for accurate prediction of antibody-antigen binding. Two orthogonal problems hinder the application of ML to antibody-specificity prediction and the benchmarking thereof: the lack of a unified ML formalization of immunological antibody-specificity prediction problems and the unavailability of large-scale synthetic datasets to benchmark real-world relevant ML methods and dataset design. Here we developed the Absolut! software suite that enables parameter-based unconstrained generation of synthetic lattice-based three-dimensional antibody-antigen-binding structures with ground-truth access to conformational paratope, epitope and affinity. We formalized common immunological antibody-specificity prediction problems as ML tasks and confirmed that for both sequence- and structure-based tasks, accuracy-based rankings of ML methods trained on experimental data hold for ML methods trained on Absolut!-generated data. The Absolut! framework has the potential to enable real-world relevant development and benchmarking of ML strategies for biotherapeutics design.

Full Text