Identifying multi-hit carcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrix representation on a GPU

Qais Al Hajri,Wu-Chun Feng,Ramu Anandakrishnan,Sajal Dash,Harold R Garner

doi:10.1038/s41598-020-58785-y

Abstract

Despite decades of research, effective treatments for most cancers remain elusive. One reason is that different instances of cancer result from different combinations of multiple genetic mutations (hits). Therefore, treatments that may be effective in some cases are not effective in others. We previously developed an algorithm for identifying combinations of carcinogenic genes with mutations (multi-hit combinations), which could suggest a likely cause for individual instances of cancer. Most cancers are estimated to require three or more hits. However, the computational complexity of the algorithm scales exponentially with the number of hits, making it impractical for identifying combinations of more than two hits. To identify combinations of greater than two hits, we used a compressed binary matrix representation, and optimized the algorithm for parallel execution on an NVIDIA V100 graphics processing unit (GPU). With these enhancements, the optimized GPU implementation was on average an estimated 12,144 times faster than the original integer matrix based CPU implementation, for the 3-hit algorithm, allowing us to identify 3-hit combinations. The 3-hit combinations identified using a training set were able to differentiate between tumor and normal samples in a separate test set with 90% overall sensitivity and 93% overall specificity. We illustrate how the distribution of mutations in tumor and normal samples in the multi-hit gene combinations can suggest potential driver mutations for further investigation. With experimental validation, these combinations may provide insight into the etiology of cancer and a rational basis for targeted combination therapy.

Highlights

Despite decades of research, effective treatments for most cancers remain elusive
The computational complexity of the algorithm, which isO(Gh × C × (Nt + Nn)), limits the combinations that can be practically identified to 2-hit (h = 2) combinations, where G ≈ 20000 is the number of genes with mutations in the input data, C is the number of combinations identified by the algorithm, Nt is the number of input tumor samples, Nn is the number of input normal samples, and h is the number of hits
We had previously developed an algorithm for identifying a set of 2-hit combinations of genes with mutations, that was able to differentiate between tumor and normal samples with high sensitivity and specificity[41]

Summary

Introduction

Effective treatments for most cancers remain elusive. One reason is that different instances of cancer result from different combinations of multiple genetic mutations (hits). The goal of this work is to optimize the multi-hit algorithm to identify combinations of more than two hits in a practical time frame (

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Feb 6, 2020
Citations: 11	License type: open-access

R Discovery Prime

R Discovery Prime

Identifying multi-hit carcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrix representation on a GPU

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Differential Allele-Specific Expression Uncovers Breast Cancer Genes Dysregulated by Cis Noncoding Mutations.
Pawel F Przytycki ... Mona Singh
Cell Systems | VOL. 10
Pawel F Przytycki, et. al.Pawel F Przytycki ... Mona Singh
01 Feb 2020
Cell Systems | VOL. 10

Mutational Analysis of Field Cancerization in Bladder Cancer
Trine Strandgaard ... Mathilde Borg Houlberg Thomsen
Bladder Cancer | VOL. 6
Trine Strandgaard, et. al.Trine Strandgaard ... Mathilde Borg Houlberg Thomsen
21 Sep 2020
Bladder Cancer | VOL. 6

Quantitative profiling of colorectal cancer-associated bacteria reveals associations between fusobacterium spp., enterotoxigenic Bacteroides fragilis (ETBF) and clinicopathological features of colorectal cancer.
Katie S Viljoen ... Andrew Mcdowell
PloS one | VOL. 10
Katie S Viljoen, et. al.Katie S Viljoen ... Andrew Mcdowell
09 Mar 2015
PloS one | VOL. 10

Abstract 1900: Rapid variant detection and annotations from next generation sequencing data using a GPU accelerated framework
Pankaj Vats ... Timothy T Harkins
American Journal of Cancer | VOL. 82
Pankaj Vats, et. al.Pankaj Vats ... Timothy T Harkins
15 Jun 2022
American Journal of Cancer | VOL. 82

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying multi-hit carcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrix representation on a GPU

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports