Protein-protein interaction (PPI) networks are a fundamental resource for modeling cellular and molecular function, and a large and sophisticated toolbox has been developed to leverage their structure and topological organization to predict the functional roles of under-studied genes, proteins, and pathways. However, the overwhelming majority of experimentally-determined interactions from which such networks are constructed come from a small number of well-studied model organisms. Indeed, most species lack even a single experimentally-determined interaction in these databases, much less a network to enable the analysis of cellular function, and methods for computational PPI prediction are too noisy to apply directly. We introduce PHILHARMONIC, a novel computational approach that couples deep learning de novo network inference with robust unsupervised spectral clustering algorithms to uncover functional relationships and high-level organization in non-model organisms. Our clustering approach allows us to de-noise the predicted network, producing highly informative functional modules. We also develop a novel algorithm called ReCIPE, which aims to reconnect disconnected clusters, increasing functional enrichment and biological interpretability. We perform remote homology-based functional annotation by leveraging hmmscan and GODomainMiner to assign initial functions to proteins at large evolutionary distances. Our clusters enable us to newly assign functions to uncharacterized proteins through "function by association." We demonstrate the ability of PHILHARMONIC to recover clusters with significant functional coherence in the reef-building coral P. damicornis, its algal symbiont C. goreaui, and the well-annotated fruit fly D. melanogaster. We perform a deeper analysis of the P. damicornis network, where we show that PHILHARMONIC clusters correlate strongly with gene co-expression and investigate several clusters that participate in temperature regulation in the coral, including the first putative functional annotation of several previously uncharacterized proteins. Easy to run end-to-end and requiring only a sequenced proteome, PHILHARMONIC is an engine for biological hypothesis generation and discovery in non-model organisms. PHILHARMONIC is available at https://github.com/samsledje/philharmonic.
Read full abstract