Abstract

Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution.

Highlights

  • Most genetic switches are in the form of sequence motifs that interact with proteins [1]

  • Position weight matrix or position weight matrix (PWM) [2,3,4,5,6] is one of the key bioinformatic tools used extensively in characterizing and predicting motifs in nucleotide and amino acid sequences. e popularity of PWM has been further increased since its implementation as a component in PSIBLAST [7], which is frequently used to generate PWM for motif characterization and prediction [8,9,10,11]

  • PWM has been applied extensively in studies of cisregulatory elements in the genome such as translation initiation sites [12], transcription initiation sites [13], transcription factor binding sites [14,15,16,17,18,19,20,21,22], yeast intron splicing sites [23], whole-genome identi cation of transcription units [24], and whole-genome screening of transcription regulatory elements [25, 26]. e PWM scores (PWMSs) for individual motifs have been found to be useful as a measure of the motif strength, for example, PWMS for individual splice sites has been used as a proxy of splicing efficiency in eukaryotes [27, 28]

Read more

Summary

Xuhua Xia

Position weight matrix (PWM) is one of the most widely used bioinformatic methods, and a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. Few generally applicable statistical tests are available for evaluating the signi cance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical signi cance tests relevant to PWM, and illustrate their application with real data. E multiple comparison problem associated with the test of site-speci c frequencies is best handled by false discovery rate methods.

Introduction
Scienti ca
Gibbs sampler
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.