Abstract
Human immunodeficiency virus-1 (HIV-1) in acquired immune deficiency syndrome (AIDS) relies on human host cell proteins in virtually every aspect of its life cycle. Knowledge of the set of interacting human and viral proteins would greatly contribute to our understanding of the mechanisms of infection and subsequently to the design of new therapeutic approaches. This work is the first attempt to predict the global set of interactions between HIV-1 and human host cellular proteins. We propose a supervised learning framework, where multiple information data sources are utilized, including co-occurrence of functional motifs and their interaction domains and protein classes, gene ontology annotations, posttranslational modifications, tissue distributions and gene expression profiles, topological properties of the human protein in the interaction network and the similarity of HIV-1 proteins to human proteins' known binding partners. We trained and tested a Random Forest (RF) classifier with this extensive feature set. The model's predictions achieved an average Mean Average Precision (MAP) score of 23%. Among the predicted interactions was for example the pair, HIV-1 protein tat and human vitamin D receptor. This interaction had recently been independently validated experimentally. The rank-ordered lists of predicted interacting pairs are a rich source for generating biological hypotheses. Amongst the novel predictions, transcription regulator activity, immune system process and macromolecular complex were the top most significant molecular function, process and cellular compartments, respectively. Supplementary material is available at URL www.cs.cmu.edu/õznur/hiv/hivPPI.html
Highlights
Unsupervised representation learning has been highly successful in the domain of natural language processing [7, 19, 24, 25, 10]
Borrowing ideas from orderless NADE [32], we propose the permutation language modeling objective that retains the benefits of AR models and allows models to capture bidirectional contexts
Relative Segment Encodings Architecturally, different from BERT that adds an absolute segment embedding to the word embedding at each position, we extend the idea of relative encodings from Transformer-XL to encode the segments
Summary
Unsupervised representation learning has been highly successful in the domain of natural language processing [7, 19, 24, 25, 10]. These methods first pretrain neural networks on large-scale unlabeled text corpora, and finetune the models or representations on downstream tasks. Under this shared high-level idea, different unsupervised pretraining objectives have been explored in literature.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.