Protein-protein interactions play important roles in nearly all events that take place in a cell. High-throughput experimental techniques enable the study of protein-protein interactions at the proteome scale through systematic identification of physical interactions among all proteins in an organism. High-throughput protein-protein interaction data, with ever-increasing volume, are becoming the foundation for new biological discoveries. A great challenge to bioinformatics is to manage, analyze, and model these data. In this review, we describe several databases that store, query, and visualize protein-protein interaction data. Comparison between experimental techniques shows that each high-throughput technique such as yeast two-hybrid assay or protein complex identification through mass spectrometry has its limitations in detecting certain types of interactions and they are complementary to each other. In silico methods using protein/DNA sequences, domain and structure information to predict protein-protein interaction can expand the scope of experimental data and increase the confidence of certain protein-protein interaction pairs. Protein-protein interaction data correlate with other types of data, including protein function, subcellular location, and gene expression profile. Highly connected proteins are more likely to be essential based on the analyses of the global architecture of large-scale interaction network in yeast. Use of protein-protein interaction networks, preferably in conjunction with other types of data, allows assignment of cellular functions to novel proteins and derivation of new biological pathways. As demonstrated in our study on the yeast signal transduction pathway for amino acid transport, integration of high-throughput data with traditional biology resources can transform the protein-protein interaction data from noisy information into knowledge of cellular mechanisms.
Read full abstract