Abstract
MotivationAccurate motif enrichment analyses depend on the choice of background DNA sequences used, which should ideally match the sequence composition of the foreground sequences. It is important to avoid false positive enrichment due to sequence biases in the genome, such as GC-bias. Therefore, relying on an appropriate set of background sequences is crucial for enrichment analysis.ResultsWe developed BiasAway, a command line tool and its dedicated easy-to-use web server to generate synthetic sequences matching any k-mer nucleotide composition or select genomic DNA sequences matching the mononucleotide composition of the foreground sequences through four different models. For genomic sequences, we provide precomputed partitions of genomes from nine species with five different bin sizes to generate appropriate genomic background sequences.Availability and implementationBiasAway source code is freely available from Bitbucket (https://bitbucket.org/CBGR/biasaway) and can be easily installed using bioconda or pip. The web server is available at https://biasaway.uio.no and a detailed documentation is available at https://biasaway.readthedocs.io.Supplementary informationSupplementary data are available at Bioinformatics online.
Highlights
Transcription factors (TFs) are proteins that control cellular processes by binding to DNA in a sequence specific manner to modulate gene expression (Lambert et al, 2018)
We developed BiasAway, a command line tool and its dedicated easy-to-use web server to generate synthetic sequences matching any k-mer nucleotide composition or select genomic DNA sequences matching the mononucleotide composition of the foreground sequences through four different models
The web server is available at https://biasaway. uio.no and a detailed documentation is available at https://biasaway.readthedocs.io
Summary
Transcription factors (TFs) are proteins that control cellular processes by binding to DNA in a sequence specific manner to modulate gene expression (Lambert et al, 2018). The importance of DNA background sequences for motif overrepresentation analysis has recurrently been highlighted (Boeva, 2016; Mariani et al, 2017; Simcha et al, 2012; Worsley Hunt et al, 2014) and several approaches have been developed to address this problem. A classical approach consists in randomly shuffling foreground sequences to preserve mono- or di-nucleotide compositions to reduce nucleotide composition biases (Jiang et al, 2008; Roadmap Epigenomics Consortium et al, 2015; Weirauch et al, 2014) Tools such as HOMER (Heinz et al, 2010), RSAT (Nguyen et al, 2018; Thomas-Chollier et al, 2008) and GENRE (Mariani et al, 2017) offer the possibility to generate sequences that are either synthetic or genomic. Background sequences generated by BiasAway can either be synthetic or real genomic sequences that match the global or local mono- or di-nucleotide composition of user-provided sequences. BiasAway is open source and its source code and interactive web interface are freely available at https://biasaway. uio.no
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.