Tenfold CV Research Articles

BackgroundAccumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well.ResultsIn this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA.ConclusionThe simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.

BackgroundCytokines act by binding to specific receptors in the plasma membrane of target cells. Knowledge of cytokine–receptor interaction (CRI) is very important for understanding the pathogenesis of various human diseases—notably autoimmune, inflammatory and infectious diseases—and identifying potential therapeutic targets. Recently, machine learning algorithms have been used to predict CRIs. “Gold Standard” negative datasets are still lacking and strong biases in negative datasets can significantly affect the training of learning algorithms and their evaluation. To mitigate the unrepresentativeness and bias inherent in the negative sample selection (non-interacting proteins), we propose a clustering-based approach for representative negative sample selection.ResultsWe used deep autoencoders to investigate the effect of different sampling approaches for non-interacting pairs on the training and the performance of machine learning classifiers. By using the anomaly detection capabilities of deep autoencoders we deduced the effects of different categories of negative samples on the training of learning algorithms. Random sampling for selecting non-interacting pairs results in either over- or under-representation of hard or easy to classify instances. When K-means based sampling of negative datasets is applied to mitigate the inadequacies of random sampling, random forest (RF) together with the combined feature set of atomic composition, physicochemical-2grams and two different representations of evolutionary information performs best. Average model performances based on leave-one-out cross validation (loocv) over ten different negative sample sets that each model was trained with, show that RF models significantly outperform the previous best CRI predictor in terms of accuracy (+ 5.1%), specificity (+ 13%), mcc (+ 0.1) and g-means value (+ 5.1). Evaluations using tenfold cv and training/testing splits confirm the competitive performance.ConclusionsA comparative analysis was performed to assess the effect of three different sampling methods (random, K-means and uniform sampling) on the training of learning algorithms using different evaluation methods. Models trained on K-means sampled datasets generally show a significantly improved performance compared to those trained on random selections—with RF seemingly benefiting most in our particular setting. Our findings on the sampling are highly relevant and apply to many applications of supervised learning approaches in bioinformatics.

Tenfold CV Research Articles

Related Topics

Articles published on Tenfold CV

PHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties

Decoding motor imagery with a simplified distributed dipoles model at source level.

Spatial prediction of soil organic carbon stocks in an arid rangeland using machine learning algorithms.

RETRACTED: Breast cancer diagnosis using multiple activation deep neural network

A novel decoding method for motor imagery tasks with 4D data representation and 3D convolutional neural networks

A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs

Improved cytokine\u2013receptor interaction prediction by exploiting the negative sample space

A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k-Fold Cross-Validation

Comparative Study on Variable Selection Approaches in Establishment of Remote Sensing Model for Forest Biomass Estimation

Estimation of personal PM2.5 and BC exposure by a modeling approach – Results of a panel study in Shanghai, China

An improved gSVM-SCADL2 with firefly algorithm for identification of informative genes and pathways

AN INCREMENTAL FRAMEWORK BASED ON CROSS-VALIDATION FOR ESTIMATING THE ARCHITECTURE OF A MULTILAYER PERCEPTRON

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Tenfold CV Research Articles

Related Topics

Articles published on Tenfold CV

PHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties

Decoding motor imagery with a simplified distributed dipoles model at source level.

Spatial prediction of soil organic carbon stocks in an arid rangeland using machine learning algorithms.

RETRACTED: Breast cancer diagnosis using multiple activation deep neural network

A novel decoding method for motor imagery tasks with 4D data representation and 3D convolutional neural networks

A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs

Improved cytokine\u2013receptor interaction prediction by exploiting the negative sample space

A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k-Fold Cross-Validation

Comparative Study on Variable Selection Approaches in Establishment of Remote Sensing Model for Forest Biomass Estimation

Estimation of personal PM2.5 and BC exposure by a modeling approach – Results of a panel study in Shanghai, China

An improved gSVM-SCADL2 with firefly algorithm for identification of informative genes and pathways

AN INCREMENTAL FRAMEWORK BASED ON CROSS-VALIDATION FOR ESTIMATING THE ARCHITECTURE OF A MULTILAYER PERCEPTRON