Abstract
One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.
Highlights
Next-generation sequencing technologies have been widely used to address many biological and medical questions on a genomewide scale
The first step of LOcating Non-Unique matched Tags (LONUT) is to divide the input dataset into two subsets: a set of unique matched tags (UMTs) and a set of non-unique tags (NUTs) based on the output dataset from the Bowtie aligned tags file
We combine the set of newly located tags from the NUTs with the set of original UMTs to produce a final set of combined matched tags (CMTs)
Summary
Despite the large number of computational tools, such as MACS [12], QuEST [13], SISSRs [14] and many other peak identification programs [15,16,17,18,19,20,21] for ChIP-seq data, and Cufflinks [22], Scripture [23] and SpliceTrap [24] for RNA-seq data, that have been developed to analyze genomic datasets generated from sequencing-based technologies, limitations in data analysis still exist. NUTs comprise up to 60% of all raw tags [25] Utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn may lead to more precise and significant biological insights
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.