Abstract

As an important epigenetic marker in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play significant roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. Investigating the CpG islands and their structures is a rigorous task because of unknown structures and the exponential number of possible patterns. In this paper, we design and implement an ad hoc application by combining the merits of Apache Spark platform and Spark programming paradigm with the particular properties of DNA genome sequences for CpG island investigation. A novel CpG box model and a Markov model are developed primarily for redefining and investigating the CpG island. Meanwhile, these models can easily fit to Spark-based cloud platforms that can greatly accelerate the analytic procedure. Two types of evaluations are successfully performed: one is accuracy-related and another is computing performance test. This paper is meant to describe this particular application on assisting the processing and the genomic analysis in epigenetic studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call