Abstract
Backgroundposition weight matrix (PWM) and sequence logo are the most widely used representations of transcription factor binding site (TFBS) in biological sequences. Sequence logo - a graphical representation of PWM, has been widely used in scientific publications and reports, due to its easiness of human perception, rich information, and simple format. Different from sequence logo, PWM works great as a precise and compact digitalized form, which can be easily used by a variety of motif analysis software. There are a few available tools to generate sequence logos from PWM; however, no tool does the reverse. Such tool to convert sequence logo back to PWM is needed to scan a TFBS represented in logo format in a publication where the PWM is not provided or hard to be acquired. A major difficulty in developing such tool to convert sequence logo to PWM is to deal with the diversity of sequence logo images.ResultsWe propose logo2PWM for reconstructing PWM from a large variety of sequence logo images. Evaluation results on over one thousand logos from three sources of different logo format show that the correlation between the reconstructed PWMs and the original PWMs are constantly high, where median correlation is greater than 0.97.ConclusionBecause of the high recognition accuracy, the easiness of usage, and, the availability of both web-based service and stand-alone application, we believe that logo2PWM can readily benefit the study of transcription by filling the gap between sequence logo and PWM.
Highlights
Position weight matrix (PWM), introduced by Stormo et al [1], is widely used for representing transcription factor binding site (TFBS) in biological sequences
PWMs are often computed from a list of aligned sequences which are potentially functionally related, and have replaced consensus sequences to be the most commonly used TFBS representation in motif discovery software and biological publications
While sequence logo is good for human perception and understanding, PWM still has advantages over sequence logo in computational field, such as its preciseness and compactness in computer storage; especially PWM is used as the standard format for motif finding and scanning [4]
Summary
We propose logo2PWM for reconstructing PWM from a large variety of sequence logo images. Evaluation results on over one thousand logos from three sources of different logo format show that the correlation between the reconstructed PWMs and the original PWMs are constantly high, where median correlation is greater than 0.97
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.