Abstract

Identifying the cell of origin of cancer is important to guide treatment decisions. Machine learning approaches have been proposed to classify the cell of origin based on somatic mutation profiles from solid biopsies. However, solid biopsies can cause complications and certain tumors are not accessible. Liquid biopsies are promising alternatives but their somatic mutation profile is sparse and current machine learning models fail to perform in this setting. We propose an improved method to deal with sparsity in liquid biopsy data. Firstly, data augmentation is performed on sparse data to enhance model robustness. Secondly, we employ data integration to merge information from: (i) SNV density; (ii) SNVs in driver genes and (iii) trinucleotide motifs. Our adapted method achieves an average accuracy of 0.88 and 0.65 on data where only 70% and 2% of SNVs are retained, compared to 0.83 and 0.41 with the original model, respectively. The method and results presented here open the way for application of machine learning in the detection of the cell of origin of cancer from liquid biopsy data.

Highlights

  • (i) SNV density; (ii) SNVs in driver genes and (iii) trinucleotide motifs

  • Tailoring targeted treatments to the cell of origin can result in more successful treatment and better prognosis for ‘cancer of unknown primary’ (CUP) patients, where the cell of origin is often difficult to determine with standard histopathology techniques [3]

  • The first measurement was designated at 70%, as this value is an approximation of the reported percentage of somatic mutations that can be detected in liquid biopsies compared to solid tissue biopsies in most advanced and metastatic cancers [17,19]

Read more

Summary

Introduction

(i) SNV density; (ii) SNVs in driver genes and (iii) trinucleotide motifs. Our adapted method achieves an average accuracy of 0.88 and 0.65 on data where only 70% and 2% of SNVs are retained, compared to 0.83 and 0.41 with the original model, respectively. The method and results presented here open the way for application of machine learning in the detection of the cell of origin of cancer from liquid biopsy data. Conventional solid tissue biopsy can only capture tumor heterogeneity to a limited extent as only one spatial location of the tumor is sampled, while ctDNA originates from the whole tumor and captures more of the tumor heterogeneity [10,11]. Another advantage of liquid biopsies is that. Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call