Abstract

A set of highly homologous proteins have been found in Drosophila melanogaster to play a critical role in the neuromorphogenesis of their nervous system. These proteins, the Dpr and the DIPs (defective probosci's extension response-Dpr interacting proteins) of 21 and 11 members, respectively, can form 231 complexes. Only a subset of 57 complexes, referred to as cognate partners, can bind with high affinity and are detected by SPR experiments. The remaining complexes were either undetected or below a certain threshold leaving uncertainties regarding their actual affinity. Taking into account similarity, phylogenetic classification, and binding experiments, the proteins still have been mapped as cognate and non-cognate partners. To accurately classify DIP-Dpr complexes and to elucidate the molecular basis for their interactions, we resorted to machine learning (ML) approaches to handle this large number of protein-protein complexes and interactions. Gathering information from evolutionary, structural, and biochemical properties in input features, we obtained good accuracy with two different algorithms: linear discriminant analysis (88%) and random forest (98%). To test the robustness of our method, we used the trained models for other DIPs and Dprs recovered from thirteen different Drosophila species. For each species, the prediction accuracy matched values obtained with Drosophila melanogaster, confirming the robustness of our models. This specific methodology to develop ML models to classify binding and non-binding partners accurately amongst homologous protein families is transferable to any relevant biological system and could help find the binding partners of new family members (mutants, newly identified/designed...).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call