Yeast Datasets Research Articles

Predicting Self-interacting proteins (SIPs) is a crucial area of research in predicting protein functions, as well as in understanding gene-disease and disease-drug associations. These interactions are integral to numerous cellular processes and play pivotal roles within cells. However, traditional methods for identifying SIPs through biological experiments are often expensive, time-consuming, and have long cycles. Therefore, the development of effective computational methods for accurately predicting SIPs is not only necessary but also presents a significant challenge. In this research, we introduce a novel computational prediction technique, VGGNGLCM, which leverages protein sequence data. This method integrates the VGGNet deep convolutional neural network (VGGN) with the Gray-Level Co-occurrence Matrix (GLCM) to detect Self-interacting proteins associations. Specifically, we initially utilized Position Specific Scoring Matrix (PSSM) to capture protein evolutionary information and integrated key features from PSSM using GLCM. We then employed VGGNet as a predictive classifier, leveraging its capabilities for powerful learning and classification prediction. Subsequently, the extracted features were input into the VGGNet deep convolutional neural network to identify Self-interacting proteins. To evaluate the performance of the VGGNGLCM model, we conducted experiments using yeast and human datasets, achieving average accuracies of 95.68% and 97.72% respectively. Additionally, we compared the prediction performance of the VGGNet classifier with that of the Convolutional Neural Network (CNN) and the state-of-the-art Support Vector Machine (SVM) using the same feature extraction method. We also compared the prediction ability of VGGNGLCM with other existing approaches. The comparison results further demonstrate the superior performance of VGGNGLCM over other prediction models in this domain. The experimental verification further strengthens the evidence that VGGNGLCM is effective and robust compared to existing methods. It also highlights the high accuracy and robustness of the VGGNGLCM model in predicting Self-interacting proteins (SIPs). Consequently, we believe that the VGGNGLCM method serves as a valuable computational tool and can catalyze extensive bioinformatics research related to SIPs prediction.

Read full abstract

BackgroundSelf-interacting proteins (SIPs), two or more copies of the protein that can interact with each other expressed by one gene, play a central role in the regulation of most living cells and cellular functions. Although numerous SIPs data can be provided by using high-throughput experimental techniques, there are still several shortcomings such as in time-consuming, costly, inefficient, and inherently high in false-positive rates, for the experimental identification of SIPs even nowadays. Therefore, it is more and more significant how to develop efficient and accurate automatic approaches as a supplement of experimental methods for assisting and accelerating the study of predicting SIPs from protein sequence information.ResultsIn this paper, we present a novel framework, termed GLCM-WSRC (gray level co-occurrence matrix-weighted sparse representation based classification), for predicting SIPs automatically based on protein evolutionary information from protein primary sequences. More specifically, we firstly convert the protein sequence into Position Specific Scoring Matrix (PSSM) containing protein sequence evolutionary information, exploiting the Position Specific Iterated BLAST (PSI-BLAST) tool. Secondly, using an efficient feature extraction approach, i.e., GLCM, we extract abstract salient and invariant feature vectors from the PSSM, and then perform a pre-processing operation, the adaptive synthetic (ADASYN) technique, to balance the SIPs dataset to generate new feature vectors for classification. Finally, we employ an efficient and reliable WSRC model to identify SIPs according to the known information of self-interacting and non-interacting proteins.ConclusionsExtensive experimental results show that the proposed approach exhibits high prediction performance with 98.10% accuracy on the yeast dataset, and 91.51% accuracy on the human dataset, which further reveals that the proposed model could be a useful tool for large-scale self-interacting protein prediction and other bioinformatics tasks detection in the future.

Read full abstract

Yeast Datasets Research Articles

Related Topics

Articles published on Yeast Datasets

A community effort to optimize sequence-based deep learning models of gene regulation.

NanoTrans: an integrated computational framework for comprehensive transcriptome analysis with nanopore direct RNA sequencing

Flexible modeling of regulatory networks improves transcription factor activity estimation

Reliable calibration and validation of phenomenological and hybrid models of high-cell-density fed-batch cultures subject to metabolic overflow

Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification

Dopcc: Detecting overlapping protein complexes via multi-metrics and co-core attachment method.

An Effective Computational Method for Predicting Self-Interacting Proteins Based on VGGNet Convolutional Neural Network and Gray-Level Co-occurrence Matrix.

Parameterization of asymmetric sigmoid functions in weighted gene co-expression network analysis

Essential proteins discovery based on dominance relationship and neighborhood similarity centrality.

Interlocus Gene Conversion, Natural Selection, and Paralog Homogenization.

MEM-FET: Essential protein prediction using membership feature and machine learning approach.

A Framework for Identifying Essential Proteins with Hybridizing Deep Neural Network and Ordinary Least Squares

Classification of Imbalanced Data Using SMOTE and AutoEncoder Based Deep Convolutional Neural Network

A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction.

MM-StackEns: A new deep multimodal stacked generalization approach for protein–protein interaction prediction

Hypothesis Testing in High-Dimensional Instrumental Variables Regression With an Application to Genomics Data

Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier

An autoencoder-based deep learning method for genotype imputation.

Complex Prediction in Large PPI Networks Using Expansion and Stripe of Core Cliques.

MGEGFP: a multi-view graph embedding method for gene function prediction based on adaptive estimation with GCN.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Yeast Datasets Research Articles

Related Topics

Articles published on Yeast Datasets

A community effort to optimize sequence-based deep learning models of gene regulation.

NanoTrans: an integrated computational framework for comprehensive transcriptome analysis with nanopore direct RNA sequencing

Flexible modeling of regulatory networks improves transcription factor activity estimation

Reliable calibration and validation of phenomenological and hybrid models of high-cell-density fed-batch cultures subject to metabolic overflow

Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification

Dopcc: Detecting overlapping protein complexes via multi-metrics and co-core attachment method.

An Effective Computational Method for Predicting Self-Interacting Proteins Based on VGGNet Convolutional Neural Network and Gray-Level Co-occurrence Matrix.

Parameterization of asymmetric sigmoid functions in weighted gene co-expression network analysis

Essential proteins discovery based on dominance relationship and neighborhood similarity centrality.

Interlocus Gene Conversion, Natural Selection, and Paralog Homogenization.

MEM-FET: Essential protein prediction using membership feature and machine learning approach.

A Framework for Identifying Essential Proteins with Hybridizing Deep Neural Network and Ordinary Least Squares

Classification of Imbalanced Data Using SMOTE and AutoEncoder Based Deep Convolutional Neural Network

A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction.

MM-StackEns: A new deep multimodal stacked generalization approach for protein–protein interaction prediction

Hypothesis Testing in High-Dimensional Instrumental Variables Regression With an Application to Genomics Data

Robust and accurate prediction of self-interacting proteins from protein sequence information by exploiting weighted sparse representation based classifier

An autoencoder-based deep learning method for genotype imputation.

Complex Prediction in Large PPI Networks Using Expansion and Stripe of Core Cliques.

MGEGFP: a multi-view graph embedding method for gene function prediction based on adaptive estimation with GCN.