Abstract

The detection of essential proteins in the protein-protein interaction (PPI) network is important for understanding the functions of organisms. At present, the algorithms used for essential protein search are mainly based on network topology and prior knowledge, so the biological knowledge contained in the PPI network itself is neglected. Therefore, we proposed the algorithm that integrates the Database of Interacting Proteins and STRING database to search essential proteins (IDSSP), in which prior proteins are composed of the highest-scoring proteins in STRING database. In addition, we propose a weak supervised learning algorithm based on the results of IDSSP. We label the essential proteins in the IDSSP algorithm results at first. Then, we extract features of the PPI network nodes by utilizing the representation learning algorithm and STRING database. Finally, the machine learning classification algorithms are used to classify the essential proteins. The results of searching essential proteins show that the top-k precision of IDSSP algorithm has increased by 11.9% compared with the state-of-art methods in the best situation. The results on essential protein classification indicate that the F 1 -score of classification methods combining with biological features are higher than those with only topological features. In a conclusion, making full use of biological information contained by STRING database is more effective than only using topological features in the task of essential protein detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call