Abstract
Gene set enrichment analysis (GSEA) is a powerful tool to associate a disease phenotype to a group of genes/proteins. GSEA attributes a specific weight to each gene/protein in the input list that depends on a metric of choice, which is usually represented by quantitative expression data. However, expression data are not always available. Here, GSEA based on betweenness centrality of a protein–protein interaction (PPI) network is described and applied to two cases, where an expression metric is missing. First, personalized PPI networks were generated from genes displaying alterations (assessed by array comparative genomic hybridization and whole exome sequencing) in four probands bearing a 16p13.11 microdeletion in common and several other point variants. Patients showed disease phenotypes linked to neurodevelopment. All networks were assembled around a cluster of first interactors of altered genes with high betweenness centrality. All four clusters included genes known to be involved in neurodevelopmental disorders with different centrality. Moreover, the GSEA results pointed out to the evidence of “cell cycle” among enriched pathways. Second, a large interaction network obtained by merging proteomics studies on three neurodegenerative disorders was analyzed from the topological point of view. We observed that most central proteins are often linked to Parkinson’s disease. The selection of these proteins improved the specificity of GSEA, with “Metabolism of amino acids and derivatives” and “Cellular response to stress or external stimuli” as top-ranked enriched pathways. In conclusion, betweenness centrality revealed to be a suitable metric for GSEA. Thus, centrality-based GSEA represents an opportunity for precision medicine and network medicine.
Highlights
High-throughput data consist in a wide amount of information obtained as the output of last-generation technologies
Four patients were referred to genetic investigations for diagnostic purposes and counseling for developmental disorders ranging from learning delay to intellectual disability, with or without associated congenital malformations
The datasets presented in this study can be found in online repositories
Summary
High-throughput data consist in a wide amount of information obtained as the output of last-generation technologies. Data from omics approaches are obtained to systematically explore human biology at a cellular or molecular level. This leads to a significant advantage in the study of a biological system in its complexity (Tebani et al, 2016). The integration of the medical/biological language and the mathematical/computational language in a cross-disciplinary approach represents a challenge (Barabási, 2007). For this reason, new theories and new algorithms have been generated, and a strong support of bioinformatics tools becomes necessary (Al-Haggar et al, 2013)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.