True Positive Predictions Research Articles

With the rapid development of information technology, which is widely used in all spheres of human life and activity, extremely large amounts of data have been accumulated today. By applying machine learning methods to this data, new practically useful knowledge can be obtained. The main goal of this paper is to study different machine learning methods for solving the classification problem and compare their efficiency and accuracy. A separate task is data pre-processing aimed at solving the problem of sample imbalance, as well as identifying the principal components that will be used to solve the classification problem. For this purpose, an information system for classifying the bankruptcy of a company with specified economic and financial characteristics was researched and developed. The study uses a dataset on the basis of which the efficiency and quality of application of several existing classification algorithms are evaluated. These classifiers are: conventional and linear Support Vector Machine, Extra Trees, Random Forest, Decision Tree, Logistic Regression, Multilayer perceptron Classifier, Gradient Boosting, Naive Bayes Classifier. For data pre-processing, we scaled the data, used the SMOTE method to get rid of the imbalance of the training sample, and performed principal component analysis and L1 regularisation. Principal component analysis allowed us to identify 15 principal components that have the greatest impact on classification accuracy and, accordingly, use them in the classification process. Analysing the results, we found that the best classifier was Random Forest with 95.9 % accuracy, and the worst was Naive Bayes with 85.1 %. To evaluate the quality of classification and select the best classifier, the Confusion matrix is used, which takes into account the number of true positive (TP) and true negative (TN) values, as well as the number of false negative (FN) and false positive (FP) classification results, and the values of such metrics as accuracy, precision, sensitivity, F1, and ROC. Accuracy is the percentage of correct answers given by the algorithm, while Recall is the number of TPs divided by the number of TPs plus the number of FNs. F1 indicates the balance between accuracy and sensitivity. Precision is the number of true positive predictions divided by the number of false positive and true negative predictions. ROC AUC is a tool for measuring performance for classification tasks at different thresholds. It shows how well a model can distinguish between classes. The conclusions present the main results of the study and indicate the main future direction of the work, namely, the study of classification results for other datasets and more efficient processing and analysis.

Simple SummaryCancer is caused by the accumulation of somatic mutations, some of which are responsible for the disease’s progression (drivers) while others are functionally neutral (passengers). Although several methods have been developed to distinguish between the two classes of mutations, very few have concentrated on using the neighborhood nucleotide sequences as potential discrimination features. In this study, we show that driver mutations’ neighborhood is significantly different from that of passengers. We further develop a novel machine learning tool, NBDriver, which is highly efficient at identifying pathogenic variants from multiple independent test datasets. Efficient and accurate identification of novel pathogenic variants from sequenced cancer genomes would help facilitate more effective therapies tailored to patients’ mutational profiles.Identifying cancer-causing mutations from sequenced cancer genomes hold much promise for targeted therapy and precision medicine. “Driver” mutations are primarily responsible for cancer progression, while “passengers” are functionally neutral. Although several computational approaches have been developed for distinguishing between driver and passenger mutations, very few have concentrated on using the raw nucleotide sequences surrounding a particular mutation as potential features for building predictive models. Using experimentally validated cancer mutation data in this study, we explored various string-based feature representation techniques to incorporate information on the neighborhood bases immediately 5′ and 3′ from each mutated position. Density estimation methods showed significant distributional differences between the neighborhood bases surrounding driver and passenger mutations. Binary classification models derived using repeated cross-validation experiments provided comparable performances across all window sizes. Integrating sequence features derived from raw nucleotide sequences with other genomic, structural, and evolutionary features resulted in the development of a pan-cancer mutation effect prediction tool, NBDriver, which was highly efficient in identifying pathogenic variants from five independent validation datasets. An ensemble predictor obtained by combining the predictions from NBDriver with three other commonly used driver prediction tools (FATHMM (cancer), CONDEL, and MutationTaster) significantly outperformed existing pan-cancer models in prioritizing a literature-curated list of driver and passenger mutations. Using the list of true positive mutation predictions derived from NBDriver, we identified a list of 138 known driver genes with functional evidence from various sources. Overall, our study underscores the efficacy of using raw nucleotide sequences as features to distinguish between driver and passenger mutations from sequenced cancer genomes.

True Positive Predictions Research Articles

Related Topics

Articles published on True Positive Predictions

A Digital Tool Supporting Pathology Practice and Identifying Leucocytes.

SimClone: Detecting Tabular Data Clones using Value Similarity

Strengthening Digital Security: Dynamic Attack Detection with LSTM, KNN, and Random Forest

Research of data mining methods for classification of imbalanced data sets

Towards adequate policy enhancement: An AI-driven decision tree model for efficient recognition and classification of EPA status via multi-emission parameters

Robust Detection of Chronic Lymphocytic Leukemia with Support Vector Machines and Flow Cytometry

Predicting the recurrence risk of renal cell carcinoma after nephrectomy: potential role of CT-radiomics for adjuvant treatment decisions.

Influence of uncertainty estimation techniques on false-positive reduction in liver lesion detection

Novel Techniques to Assess Predictive Systems and Reduce Their Alarm Burden.

A deep learning approach to median nerve evaluation in ultrasound images of carpal tunnel inlet

Network Architecture Influence on Facial Emotion Recognition

Verification of subclinical carotid atherosclerosis as part of risk stratification in overweight and obesity: the role of machine learning in the development of a diagnostic algorithm

Using Support Vector Machine (SVM) with GPS Ionospheric TEC Estimations to Potentially Predict Earthquake Events

Precursors-driven machine learning prediction of chaotic extreme pulses in Kerr resonators

Seizure Prediction in Genetic Rat Models of Absence Epilepsy: Improved Performance through Multiple-Site Cortico-Thalamic Recordings Combined with Machine Learning.

DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes.

Convolutional neural network and transfer learning for dose volume histogram prediction for prostate cancer radiotherapy

Mother’s Lifestyle Feature Relevance for NICU and Preterm Birth Prediction

Predicting lymph node metastasis in patients with oropharyngeal cancer by using a convolutional neural network with associated epistemic and aleatoric uncertainty

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

True Positive Predictions Research Articles

Related Topics

Articles published on True Positive Predictions

A Digital Tool Supporting Pathology Practice and Identifying Leucocytes.

SimClone: Detecting Tabular Data Clones using Value Similarity

Strengthening Digital Security: Dynamic Attack Detection with LSTM, KNN, and Random Forest

Research of data mining methods for classification of imbalanced data sets

Towards adequate policy enhancement: An AI-driven decision tree model for efficient recognition and classification of EPA status via multi-emission parameters

Robust Detection of Chronic Lymphocytic Leukemia with Support Vector Machines and Flow Cytometry

Predicting the recurrence risk of renal cell carcinoma after nephrectomy: potential role of CT-radiomics for adjuvant treatment decisions.

Influence of uncertainty estimation techniques on false-positive reduction in liver lesion detection

Novel Techniques to Assess Predictive Systems and Reduce Their Alarm Burden.

A deep learning approach to median nerve evaluation in ultrasound images of carpal tunnel inlet

Network Architecture Influence on Facial Emotion Recognition

Verification of subclinical carotid atherosclerosis as part of risk stratification in overweight and obesity: the role of machine learning in the development of a diagnostic algorithm

Using Support Vector Machine (SVM) with GPS Ionospheric TEC Estimations to Potentially Predict Earthquake Events

Precursors-driven machine learning prediction of chaotic extreme pulses in Kerr resonators

Seizure Prediction in Genetic Rat Models of Absence Epilepsy: Improved Performance through Multiple-Site Cortico-Thalamic Recordings Combined with Machine Learning.

DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning

Sequence Neighborhoods Enable Reliable Prediction of Pathogenic Mutations in Cancer Genomes.

Convolutional neural network and transfer learning for dose volume histogram prediction for prostate cancer radiotherapy

Mother’s Lifestyle Feature Relevance for NICU and Preterm Birth Prediction

Predicting lymph node metastasis in patients with oropharyngeal cancer by using a convolutional neural network with associated epistemic and aleatoric uncertainty