Abstract

BackgroundSequence logo plots have become a standard graphical tool for visualizing sequence motifs in DNA, RNA or protein sequences. However standard logo plots primarily highlight enrichment of symbols, and may fail to highlight interesting depletions. Current alternatives that try to highlight depletion often produce visually cluttered logos.ResultsWe introduce a new sequence logo plot, the EDLogo plot, that highlights both enrichment and depletion, while minimizing visual clutter. We provide an easy-to-use and highly customizable R package Logolas to produce a range of logo plots, including EDLogo plots. This software also allows elements in the logo plot to be strings of characters, rather than a single character, extending the range of applications beyond the usual DNA, RNA or protein sequences. And the software includes new Empirical Bayes methods to stabilize estimates of enrichment and depletion, and thus better highlight the most significant patterns in data. We illustrate our methods and software on applications to transcription factor binding site motifs, protein sequence alignments and cancer mutation signature profiles.ConclusionsOur new EDLogo plots and flexible software implementation can help data analysts visualize both enrichment and depletion of characters (DNA sequence bases, amino acids, etc.) across a wide range of applications.

Highlights

  • Sequence logo plots have become a standard graphical tool for visualizing sequence motifs in DNA, RNA or protein sequences

  • The plots represent the primary discovered motif disc1 of Early B cell factor EBF1 from ENCODE [14]. This example showcases the effectiveness of the standard logo plot in highlighting enrichments: in our opinion it does this better than the other two plots, and in this sense the other plots should be viewed as complementing the standard plot rather than replacing it

  • The Enrichment depletion logo (EDLogo) plot is most effective at highlighting depletion of bases G and C at the two positions in the middle of the sequence

Read more

Summary

Results

Panel (c) shows logo plots representing an estimated cancer mutation signature profile (signature 12) from In this example the EDLogo plot and the standard logo plot differ on the enrichments they highlight at the central position: unlike the standard plot, the EDLogo plot highlights enrichment of T → C in addition to the primary enrichment C → T. The method produces estimates of rthat vary no more than is supported by the data, resulting in more parsimonious plots In this example (Fig. 3b) the plot shows a large N alone in the center position, highlighting that the data can be explained purely by strong enrichment of N.

Background
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.