Easy-PSAP: An Integrated Workflow to Prioritize Pathogenic Variants in Sequence Data from a Single Individual

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Introduction: Next-generation sequencing (NGS) data analysis has become an integral part of clinical genetic diagnosis, raising the question of variant prioritization. The Population Sampling Probability (PSAP) method has been developed to tackle the issue of variant prioritization in the exome of a single patient, by leveraging allele frequencies from population databases and a variant pathogenicity score. Methods: Here, we present Easy-PSAP, a completely new implementation of the PSAP method comprising two user-friendly and highly adaptable pipelines. Easy-PSAP allows the gene-based recalibration of any in silico pathogenicity prediction score compared to scores of variants seen in the general population, including popular scores like CADD or AlphaMissense. Easy-PSAP can evaluate genetic variants at the scale of a whole exome or a whole genome using information from the latest population and annotation databases. Results: Through simulations on synthetic disease exomes, we show that Easy-PSAP is able to rank more than 50% of causal pathogenic variants in the top 10 variants for an autosomal dominant model of transmission and in top 1 for an autosomal recessive model of transmission. Discussion: These findings, along with the accessibility of the pipeline to both researchers and clinicians, make Easy-PSAP a state-of-the-art tool for variant prioritization in NGS data that can continue to evolve as new frameworks and databases become available. Easy-PSAP is implemented in R and bash within an open-source Snakemake framework. It is available on GitHub alongside conda environments containing the required dependencies (https://github.com/msogloblinsky/Easy-PSAP).

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.jmoldx.2022.06.006
NGS4THAL, a One-Stop Molecular Diagnosis and Carrier Screening Tool for Thalassemia and Other Hemoglobinopathies by Next-Generation Sequencing
  • Jul 19, 2022
  • The Journal of molecular diagnostics : JMD
  • Yujie Cao + 15 more

Thalassemia is one of the most common genetic diseases and a major health threat worldwide. Accurate, efficient, and scalable analysis of next-generation sequencing (NGS) data is much needed for its molecular diagnosis and carrier screening. We developed NGS4THAL, a bioinformatics analysis pipeline analyzing NGS data to detect pathogenic variants for thalassemia and other hemoglobinopathies. NGS4THAL realigns ambiguously mapped NGS reads derived from the homologous Hb gene clusters for accurate detection of point mutations and small insertions/deletions. It uses a combination of complementary structural variant (SV) detection tools and an in-house database of control data containing specific SVs to achieve accurate detection of the complex SV types. Detected variants are matched with those in HbVar (A Database of Human Hemoglobin Variants and Thalassemia Mutations), allowing recognition of known pathogenic variants, including disease modifiers. Tested on simulation data, NGS4THAL achieved high sensitivity and specificity. For targeted NGS sequencing data from samples with laboratory-confirmed pathogenic Hb variants, it achieved 100% detection accuracy. Application of NGS4THAL on whole genome sequencing data from unrelated studies revealed thalassemia mutation carrier rates for Hong Kong Chinese and Northern Vietnamese that were consistent with previous reports. NGS4THAL is a highly accurate and efficient molecular diagnosis tool for thalassemia and other hemoglobinopathies based on tailored analysis of NGS data and may be scaled for population carrier screening.

  • Research Article
  • 10.1093/gigascience/giaa141
PM4NGS, a project management framework for next-generation sequencing data analysis.
  • Jan 7, 2021
  • GigaScience
  • Roberto Vera Alvarez + 3 more

FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/.

  • Research Article
  • 10.1097/qad.0000000000003820
Assessing transmission attribution risk from simulated sequencing data in HIV molecular epidemiology.
  • Mar 4, 2024
  • AIDS (London, England)
  • Fabrícia F Nascimento + 3 more

HIV molecular epidemiology (ME) is the analysis of sequence data together with individual-level clinical, demographic, and behavioral data to understand HIV epidemiology. The use of ME has raised concerns regarding identification of the putative source in direct transmission events. This could result in harm ranging from stigma to criminal prosecution in some jurisdictions. Here we assessed the risks of ME using simulated HIV genetic sequencing data. We simulated social networks of men-who-have-sex-with-men, calibrating the simulations to data from San Diego. We used these networks to simulate consensus and next-generation sequence (NGS) data to evaluate the risks of identifying direct transmissions using different HIV sequence lengths, and population sampling depths. To identify the source of transmissions, we calculated infector probability and used phyloscanner software for the analysis of consensus and NGS data, respectively. Consensus sequence analyses showed that the risk of correctly inferring the source (direct transmission) within identified transmission pairs was very small and independent of sampling depth. Alternatively, NGS analyses showed that identification of the source of a transmission was very accurate, but only for 6.5% of inferred pairs. False positive transmissions were also observed, where one or more unobserved intermediaries were present when compared to the true network. Source attribution using consensus sequences rarely infers direct transmission pairs with high confidence but is still useful for population studies. In contrast, source attribution using NGS data was much more accurate in identifying direct transmission pairs, but for only a small percentage of transmission pairs analyzed.

  • Research Article
  • Cite Count Icon 26
  • 10.1161/circgenetics.113.000085
Short Read (Next-Generation) Sequencing
  • Jul 14, 2013
  • Circulation: Cardiovascular Genetics
  • Jaya Punetha + 1 more

Rapid advances in DNA sequencing technologies have made it increasingly cost-effective to obtain accurate and timely large-scale genomic sequence data on individuals (short read massively parallel or next generation [next-gen]). A next-gen molecular diagnostic approach that has seen rapid deployment in the clinic over the last year is exome sequencing. Whole exome sequencing covers all protein-coding genes in the genome (≈1.1% of genome), and an exome test for a single patient generates ≈6 gigabases (109 bp) of DNA sequence data. A key challenge facing routine use of next-gen data in patient diagnosis and management is data interpretation. What sequence variant findings are relevant to diagnosis (pathogenic mutations)? What sequence variant findings are relevant to clinical care but not necessarily to patient diagnosis (clinically actionable incidental data)? What sequence information should be stored, and where can it be stored? This review provides a tutorial on current approaches to answering these questions. A recent landmark study showed that application of next-gen sequencing to a large cohort of idiopathic dilated cardiomyopathy patients found ≈27% of patients to show mutations of the titin gene, the most complex gene in the genome (363 exons). We use titin in cardiomyopathy as an exemplar for explaining next-gen sequencing approaches and data interpretation. Decreasing sequencing costs and broad dissemination of next-generation (next-gen) equipment and expertise are increasing availability of massively parallel sequencing of patient DNA samples (short read massively parallel or next-gen sequencing).1,2 Most rapidly expanding is exome sequencing, where all protein-coding sequences (exons) are selected from total genomic DNA and selectively sequenced.3 Alternative approaches to next-gen sequencing include targeted sequencing (TS) and whole genome (complete genome) sequencing. Currently, marketed targeted Sanger sequencing panels using traditional individual exon-by-exon sequencing remain expensive and time consuming, and massively parallel next-gen approaches are beginning to supplant …

  • Research Article
  • Cite Count Icon 453
  • 10.1093/nar/gks003
Cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate
  • Jan 2, 2012
  • Nucleic Acids Research
  • Günter Klambauer + 6 more

Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.

  • Research Article
  • Cite Count Icon 14
  • 10.1007/s10142-024-01415-x
A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis.
  • Aug 19, 2024
  • Functional & integrative genomics
  • Kasmika Borah + 5 more

Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.

  • Research Article
  • 10.1158/1538-7445.am2019-1660
Abstract 1660: Identification of allelic imbalance utilizing heterozygous genotype allele frequencies and intensities
  • Jul 1, 2019
  • Cancer Research
  • Kyle Chang + 6 more

Allelic imbalance (AI) events, such as amplification, deletion or copy-neutral loss-of-heterozygosity (cn-LOH) can result in the activations of oncogenes or inactivations of tumor suppressor genes that are critical to the process of carcinogenesis and metastasis. However, tumor samples often have low tumor-cellularity and small subclones that require sensitive and robust algorithm for AI detections. To overcome this challenge, our lab has developed an original software application hapLOH (Vattathil and Scheet, 2013) that identifies AI with high sensitivity in SNP array data by utilizing heterozygous genotype allele frequencies in SNP array data using a Hidden Markov Model. Motivated by the plethora of next-generation sequencing (NGS) data, we carried this logic and functionality and developed hapLOHseq (San Lucas et al, 2016) for detection of subtle AI in NGS data using reference and alternate allele read depths and a set of haplotype estimates based on 1000 genome data. Recently, we sought to improve AI detection and provide support for different data types by developing haploh-cn. Haploh-cn extends our previous tools by supporting both SNP array and NGS data, and by incorporating intensity data and Log R ratios (LRR) into the underlying Hidden Markov Model for SNP array data, and incorporating sequencing depth for NGS data. In order to evaluate haploh-cn’s performances, we have downloaded The Cancer Genome Atlas Lung Adenocarcinoma whole exome sequencing data and corresponding Affymetrix SNP6 array data. Then, we simulated tumor cellularity by performing in-silico dilution of NGS and SNP array data. For NGS data, we down sampled tumor sequencing BAM files and mixed in matched-normal sequencing reads. For SNP6 array data, we combined the adjusted B allele frequencies of heterozygous sites in the tumor sample and matched-normal sample. Using the AI events detected in the undiluted tumor samples as the gold standard, we assessed the performance of haploh-cn in the computational diluted samples. Our results showed that haploh-cn had high sensitivity without sacrificing specificity in diluted tumor samples. In summary, haploh-cn is a robust and powerful method for profiling subtle AI in NGS and SNP array data. Citation Format: Kyle Chang, Francis A. San Lucas, Zuhal Ozcan, Smruthy Sivakumar, Yasminka A. Jakubek, Richard G. Fowler, Paul Scheet. Identification of allelic imbalance utilizing heterozygous genotype allele frequencies and intensities [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1660.

  • Research Article
  • Cite Count Icon 251
  • 10.1093/nar/gkr599
SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data.
  • Aug 3, 2011
  • Nucleic acids research
  • Zhi Wei + 4 more

We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial–binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to ‘accept or reject the candidates’ provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.

  • Research Article
  • 10.4081/itjm.2024.1830
Screening of the key genes and signaling pathways for schizophrenia using bioinformatics and next generation sequencing data analysis
  • Dec 18, 2024
  • Italian Journal of Medicine
  • Iranna Kotturshetti + 4 more

Schizophrenia is thought to be the most prevalent chronic psychiatric disorder. Researchers have identified numerous proteins associated with the occurrence and development of schizophrenia. This study aimed to identify potential core genes and pathways involved in schizophrenia through exhaustive bioinformatics and next generation sequencing (NGS) data analyses using GSE106589 NGS data of neural progenitor cells and neurons obtained from healthy controls and patients with schizophrenia. The NGS data were downloaded from the Gene Expression Omnibus database. NGS data was processed by the DESeq2 package in R software, and the differentially expressed genes (DEGs) were identified. Gene ontology (GO) enrichment analysis and REACTOME pathway enrichment analysis were carried out to identify potential biological functions and pathways of the DEGs. Protein-protein interaction network, module, micro-RNA (miRNA)-hub gene regulatory network, transcription factor (TF)-hub gene regulatory network, and drug-hub gene interaction network analysis were performed to identify the hub genes, miRNA, TFs, and drug molecules. Potential hub genes were analyzed using receiver operating characteristic curves in the R package. In this investigation, an overall 955 DEGs were identified: 478 genes were remarkably upregulated and 477 genes were distinctly downregulated. These genes were enriched for GO terms and pathways mainly involved in the multicellular organismal process, G protein-coupled receptor ligand binding, regulation of cellular processes, and amine ligand-binding receptors. MYC, FN1, CDKN2A, EEF1G, CAV1, ONECUT1, SYK, MAPK13, TFAP2A, and BTK were considered the potential hub genes. The MiRNA-hub gene regulatory network, TF-hub gene regulatory network, and drug-hub gene interaction network were constructed successfully and predicted key miRNAs, TFs, and drug molecules for schizophrenia diagnosis and treatment. On the whole, the findings of this investigation enhance our understanding of the potential molecular mechanisms of schizophrenia and provide potential targets for further investigation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 17
  • 10.1186/s12864-020-6455-x
Comparison of multiple algorithms to reliably detect structural variants in pears
  • Jan 20, 2020
  • BMC Genomics
  • Yueyuan Liu + 6 more

BackgroundStructural variations (SVs) have been reported to play an important role in genetic diversity and trait regulation. Many computer algorithms detecting SVs have recently been developed, but the use of multiple algorithms to detect high-confidence SVs has not been studied. The most suitable sequencing depth for detecting SVs in pear is also not known.ResultsIn this study, a pipeline to detect SVs using next-generation and long-read sequencing data was constructed. The performances of seven types of SV detection software using next-generation sequencing (NGS) data and two types of software using long-read sequencing data (SVIM and Sniffles), which are based on different algorithms, were compared. Of the nine software packages evaluated, SVIM identified the most SVs, and Sniffles detected SVs with the highest accuracy (> 90%). When the results from multiple SV detection tools were combined, the SVs identified by both MetaSV and IMR/DENOM, which use NGS data, were more accurate than those identified by both SVIM and Sniffles, with mean accuracies of 98.7 and 96.5%, respectively. The software packages using long-read sequencing data required fewer CPU cores and less memory and ran faster than those using NGS data. In addition, according to the performances of assembly-based algorithms using NGS data, we found that a sequencing depth of 50× is appropriate for detecting SVs in the pear genome.ConclusionThis study provides strong evidence that more than one SV detection software package, each based on a different algorithm, should be used to detect SVs with higher confidence, and that long-read sequencing data are better than NGS data for SV detection. The SV detection pipeline that we have established will facilitate the study of diversity in other crops.

  • Book Chapter
  • Cite Count Icon 5
  • 10.1007/978-3-319-17157-9_4
Variant Calling Using NGS Data in European Aspen (Populus tremula)
  • Jan 1, 2015
  • Jing Wang + 3 more

Analysis of next-generation sequencing (NGS) data is rapidly becoming an important source of information for genetics and genomics studies. The utility of such data does, however, rely crucially on the accuracy and quality of SNP and genotype calling. Identification of genetic variants (SNPs and short indels) from NGS data is an area of active research and many recent statistical methods have been developed to both improve and quantify the large uncertainty associated with genotype calling. The detection of genetic variants from NGS data is prone to errors, due to multiple factors such as base-calling, alignment errors, and read coverage. Here we highlight some of the issues and review and exemplify some of the recent methods that have been developed for genotype calling. We also provide guidelines for their application to whole-genome re-sequencing data using a data set based on a number of European aspen (Populus tremula) individuals each sequenced to a depth of about 20× coverage per individual.

  • Research Article
  • Cite Count Icon 4
  • 10.1093/jhered/esaa001
Population Genomics Training for the Next Generation of Conservation Geneticists: ConGen 2018 Workshop
  • Mar 10, 2020
  • Journal of Heredity
  • Amanda Stahlke + 10 more

The increasing availability and complexity of next-generation sequencing (NGS) data sets make ongoing training an essential component of conservation and population genetics research. A workshop entitled “ConGen 2018” was recently held to train researchers in conceptual and practical aspects of NGS data production and analysis for conservation and ecological applications. Sixteen instructors provided helpful lectures, discussions, and hands-on exercises regarding how to plan, produce, and analyze data for many important research questions. Lecture topics ranged from understanding probabilistic (e.g., Bayesian) genotype calling to the detection of local adaptation signatures from genomic, transcriptomic, and epigenomic data. We report on progress in addressing central questions of conservation genomics, advances in NGS data analysis, the potential for genomic tools to assess adaptive capacity, and strategies for training the next generation of conservation genomicists.

  • Research Article
  • Cite Count Icon 267
  • 10.7717/peerj-cs.20
CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data
  • Aug 26, 2015
  • PeerJ Computer Science
  • Steve Davis + 6 more

The analysis of next-generation sequence (NGS) data is often a fragmented step-wise process. For example, multiple pieces of software are typically needed to map NGS reads, extract variant sites, and construct a DNA sequence matrix containing only single nucleotide polymorphisms (i.e., a SNP matrix) for a set of individuals. The management and chaining of these software pieces and their outputs can often be a cumbersome and diffi cult task. Here, we present CFSAN SNP Pipeline, which combines into a single package the mapping of NGS reads to a reference genome with Bowtie2, processing of those mapping (BAM) files using SAMtools, identification of variant sites using VarScan, and production of a SNP matrix using custom Python scripts. We also introduce a Python package (CFSAN SNP Mutator) that when given a reference genome will generate variants of known position against which we validate our pipeline. We created 1,000 simulated Salmonella enterica sp. enterica Serovar Agona genomes at 100× and 20× coverage, each containing 500 SNPs, 20 single-base insertions and 20 single-base deletions. For the 100× dataset, the CFSAN SNP Pipeline recovered 98.9% of the introduced SNPs and had a false positive rate of 1.04 × 10 −6 ; for the 20× dataset 98.8% of SNPs were recovered and the false positive rate was 8.34 × 10 −7 . Based on these results, CFSAN SNP Pipeline is a robust and accurate tool that it is among the first to combine into a single executable the myriad steps required to produce a SNP matrix from NGS data. Such a tool is useful to those working in an applied setting (e.g., food safety traceback investigations) as well as for those interested in evolutionary questions.

  • Research Article
  • Cite Count Icon 17
  • 10.1136/jmedgenet-2012-101001
SNVerGUI: a desktop tool for variant analysis of next-generation sequencing data
  • Sep 28, 2012
  • Journal of Medical Genetics
  • Wei Wang + 4 more

BackgroundAdvances in next generation sequencing (NGS) technology have made it possible to interrogate comprehensively genome-wide genetic variations. However, most existing tools for variation detection are based on command-line interface, which...

  • Research Article
  • 10.1186/s43042-024-00584-5
Identification and interaction analysis of molecular markers in myocardial infarction by bioinformatics and next-generation sequencing data analysis
  • Oct 14, 2024
  • Egyptian Journal of Medical Human Genetics
  • Basavaraj Vastrad + 1 more

BackgroundCardiovascular diseases are prevalent worldwide with any age, and it is characterized by sudden blockage of blood flow to heart and permanent damage to the heart muscle, whose cause and underlying molecular mechanisms are not fully understood. This investigation aimed to explore and identify essential genes and signaling pathways that contribute to the progression of MI.MethodsThe aim of this investigation was to use bioinformatics and next-generation sequencing (NGS) data analysis to identify differentially expressed genes (DEGs) with diagnostic and therapeutic potential in MI. NGS dataset (GSE132143) was downloaded from the Gene Expression Omnibus (GEO) database. DEGs between MI and normal control samples were identified using the DESeq2 R bioconductor tool. The gene ontology (GO) and REACTOME pathway enrichment analyses of the DEGs were performed using g:Profiler. Next, four kinds of algorithms in the protein–protein interaction (PPI) were performed to identify potential novel biomarkers. Next, miRNA-hub gene regulatory network analysis and TF-hub gene regulatory network were constructed by miRNet and NetworkAnalyst database, and Cytoscape software. Finally, the diagnostic effectiveness of hub genes was predicted by receiver operator characteristic curve (ROC) analysis and AUC more than 0.800 was considered as having the capability to diagnose MI with excellent specificity and sensitivity.ResultsA total of 958 DEGs were identified, consisting of 480 up-regulated genes and 478 down-regulated genes. The enriched GO terms and pathways of the DEGs include immune system, neuronal system, response to stimulus and multicellular organismal process. Ten hub genes (namely cftr, cdk1, rps13, rps15a, rps27, notch1, mrpl12, nos2, ccdc85b and atn1) were obtained via protein–protein interaction analysis results. MiRNA-hub gene regulatory network and TF-hub gene regulatory network showed that hsa-mir-409-3p, hsa-mir-3200-3p, creb1 and tp63 might play an important role in the MI.ConclusionsAnalysis of next-generation sequencing dataset combined with global network information and validation presents a successful approach to uncover the risk hub genes and prognostic markers of MI. Our investigation identified four risk- and prognostic-related gene signatures, including cftr, cdk1, rps13, rps15a, rps27, notch1, mrpl12, nos2, ccdc85b and atn1. This gene sets contribute a new perspective to improve the diagnostic, prognostic, and therapeutic outcomes of MI.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon