A Biterm Topic Model for Sparse Mutation Data

Itay Sason,Roded Sharan,Mark D M Leiserson,Yuexi Chen

doi:10.3390/cancers15051601

Itay Sason, Roded Sharan + Show 2 more

Open Access

https://doi.org/10.3390/cancers15051601

Copy DOI

Abstract

Mutational signature analysis promises to reveal the processes that shape cancer genomes for applications in diagnosis and therapy. However, most current methods are geared toward rich mutation data that has been extracted from whole-genome or whole-exome sequencing. Methods that process sparse mutation data typically found in practice are only in the earliest stages of development. In particular, we previously developed the Mix model that clusters samples to handle data sparsity. However, the Mix model had two hyper-parameters, including the number of signatures and the number of clusters, that were very costly to learn. Therefore, we devised a new method that was several orders-of-magnitude more efficient for handling sparse data, was based on mutation co-occurrences, and imitated word co-occurrence analyses of Twitter texts. We showed that the model produced significantly improved hyper-parameter estimates that led to higher likelihoods of discovering overlooked data and had better correspondence with known signatures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Biterm Topic Model for Sparse Mutation Data

Abstract

Talk to us

Similar Papers

More From: Cancers

Lead the way for us

Journal: Cancers	Publication Date: Mar 4, 2023
License type: CC BY 4.0

Similar Papers

Whole exome and genome sequencing in mendelian disorders: a diagnostic and health economic analysis
...
European Journal of Human Genetics | VOL. 30
, et. al. ...
15 Aug 2022
European Journal of Human Genetics | VOL. 30

The use of fetal exome sequencing in prenatal diagnosis: a points to consider document of the American College of Medical Genetics and Genomics (ACMG)
Kristin G Monaghan ... Nancy C Rose
Genetics in Medicine | VOL. 22
Kristin G Monaghan, et. al.Kristin G Monaghan ... Nancy C Rose
01 Apr 2020
Genetics in Medicine | VOL. 22

Reducing INDEL calling errors in whole genome and exome sequencing data
Han Fang ... Michael C Schatz
Genome Medicine | VOL. 6
Han Fang, et. al.Han Fang ... Michael C Schatz
01 Jan 2014
Genome Medicine | VOL. 6

Reducing INDEL calling errors in whole genome and exome sequencing data.
Han Fang ... Michael Ronemus
Genome Medicine | VOL. 6
Han Fang, et. al.Han Fang ... Michael Ronemus
28 Oct 2014
Genome Medicine | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Biterm Topic Model for Sparse Mutation Data

Abstract

Talk to us

Similar Papers

More From: Cancers