Abstract
Genomic variations are in the focus of research to uncover mechanisms of host-pathogen interactions and diseases such as cancer. Nowadays, next-generation sequencing (NGS) data are analyzed through dedicated pipelines to detect them. Surrogate NGS data in conjunction with genomic variations help to evaluate pipelines and validate their outcomes, fostering selection of proper tools for a given scientific question. I describe how existing approaches for simulating NGS data in conjunction with genomic variations fail to model local enrichments of single nucleotide polymorphisms (SNPs), so called SNP clusters. Two distributions for count data are applied to publicly available collections of genomic variations. The results suggest modeling of SNP cluster sizes by overdispersion-aware distributions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Journal of Computational Biology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.