Abstract
Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data - a common problem in real-world data - without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits. Workflows offer versatility with custom feature design, choice of similarity metric; speed is improved by parallel execution. Built-in functions and examples allow users to compute model performance metrics such as AUROC, AUPR, and accuracy. netDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. Advanced users can build more complex predictor designs with functional building blocks used in the default design. Finally, the netDx Bioconductor package provides a novel workflow for pathway-based patient classification from sparse genetic data.
Highlights
Supervised learning methods are useful in clinical genomics for disease diagnosis, risk stratification for prognosis, and evaluating treatment response
Most machine learning methods do not handle missing data – a common feature of real-world datasets – without prior data imputation or filtering. netDx is a supervised learning algorithm that classifies patients by integrating multimodal patient data2. It is notable among machine learning methods for handling missing data without imputation, and excels at interpretability by enabling users to create biologically-meaningful grouping of features, such as grouping genes into pathway-level features. netDx integrates multi-modal data by converting each layer into a patient similarity network and integrating these networks (Figure 1a)
Is an example of an enrichment map generated by running the above predictor with more real-world parameter values, and all available pathways (Figure 5): Visualize integrated patient similarity network based on top features
Summary
NetDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. The netDx Bioconductor package provides a novel workflow for pathwaybased patient classification from sparse genetic data. This article is included in the Bioconductor gateway. This article is included in the Cytoscape gateway. This article is included in the RPackage gateway. 1. The code in Use Case 4 had outdated function calls. The code in Use Case 4 had outdated function calls 3. Function names from the outdated, original version of netDx have been removed from Table 1. Any further responses from the reviewers can be found at the end of the article
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.