Abstract
Essential proteins are critical components of living organisms and indispensable to cellular life. Identification of essential proteins plays a critical role in the survival and development of life process and understanding the function of cell machinery. The experimental methods are usually costly and time-consuming. In order to overcome these limitations, many computational methods have been proposed to discover essential proteins based on the topological features of PPI networks and other biological information. In this paper, a new method named NIE is proposed to predict essential proteins based on second-order neighborhood information and information entropy of protein complex and subcellular localization. Firstly, a number of studies have shown that the RNA-Seq data is more advantageous than traditional gene expression data in predicting essential proteins. Meanwhile, the protein essentiality is closely related to the subcellular localization information, protein complex information and protein GO terms through data analysis. A weighted PPI network is constructed to reduce the impact of false positives and false negatives data on the identification of essential proteins, which integrates the GO terms information with Pearson correlation coefficient of RNA-Seq data. Secondly, the information entropy of protein complexes and subcellular localization is calculated to represent the biological characteristics of proteins. Furthermore, an information propagation model is constructed, which combines the biological properties of the proteins with the second-order neighborhood information in the PPI network to measure the essentiality of the proteins. In the experiments section, the proposed method is implemented on three common datasets (DIP, Krogan and MIPS) of Saccharomyces cerevisiae. A comparison study with other commonly used algorithms, including LAC, NC, PeC, WDC, UC, LIDC and LBCC is performed to evaluate the performance of NIE. The results show that the new method NIE has a better performance in predicting essential proteins.
Highlights
Essential proteins are those proteins to result in lethality or infertility of a cell if one of them has been deleted [1]
We developed a novel method to predicting essential proteins based on second-order neighborhood information and information extropy, named NIE
In order to evaluate the accuracy and the efficiency of the proposed NIE algorithm, we implemented it on three common datasets DIP, Krogan and MIPS in Matlab R2015b and executed on a quad-core processor 3.30GHz PC with 8G RAM
Summary
Essential proteins are those proteins to result in lethality or infertility of a cell if one of them has been deleted [1]. Essential proteins are closely related to the structure, function, and regulation of biological systems, play a very important role in the whole life of the cell. Identifying essential proteins and studying the properties and mechanisms of essential proteins have great significance in biology and pathology. Biological assays such as RNA interference [4], single gene knockout [5] and conditional gene knockout [6] methods can determine essential proteins accurately, but the experimental cost is high, and the experimental period is long. With the rapid developments of high-throughput technologies and computer technologies, it is a trend to predict and identify essential proteins through bioinformatics and computational biology methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.