Abstract

Summary: Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats.Availability and Implementation: Web application: http://www.heatstarseq.roslin.ed.ac.uk/. Source code: https://github.com/gdevailly.Contact: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • High throughput sequencing is becoming routine for many biological assays including transcriptome analysis through RNA sequencing (RNA-seq), or transcription factor (TF) binding sites identification through chromatin immuno-precipitation followed by sequencing (ChIPseq)

  • An oestrogen receptor (ER) alpha ChIP-seq in MCF7 cells (Zhuang et al, 2015) comparison to the ENCODE TFBS dataset by sub-selecting ENCODE ER ChIP-seq experiments revealed that the binding pattern of ERα in MCF7 cells was more similar to its binding pattern in T-47D cells than in ECC-1 cells (Figure 1A)

  • MCF7 and T-47D were derived from mammary tumours while ECC-1 is an endometrial cell line

Read more

Summary

Introduction

High throughput sequencing is becoming routine for many biological assays including transcriptome analysis through RNA sequencing (RNA-seq), or transcription factor (TF) binding sites identification through chromatin immuno-precipitation followed by sequencing (ChIPseq) Collaborative projects such as Bgee (Bastian et al.), ENCODE (Bernstein et al, 2012), and Roadmap Epigenomics (Kundaje et al, 2015) have generated genome-wide datasets across hundreds of cell types or tissues. Despite this large data being freely available in the public domain, the lack of computational tools accessible to experimental scientists with no or elementary computational skills prohibits the use of this data to its full potential for discovery. Heat*seq is an interactive web tool that allows users to contextualise their sequencing data with respect to vast amounts of public data in a few minutes without requiring any programming skills

Methods
Results
User data quality control
Cell context identification
New hypotheses by data integration
Public data assessment
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.