Position weight matrix and Perceptron

Xuhua Xia

doi:10.1007/978-3-319-90684-3_3

Abstract

This chapter covers two frequently used algorithms for motif characterization and prediction. The first part is on position weight matrix (PWM) algorithm which takes in a set of aligned motif sequences (e.g., 5’ splice sites) and generates a list of PWM scores which may be taken as the signal strength for each input sequence. In this context, PWM is somewhat a sequence equivalent of principle component analysis in multivariate statistics. Also generated are a PWM for highlighting the non-randomness of motif patterns and the significance tests of the site-specific motif patterns. PWM also serves as an essential component in algorithms for de novo motif discovery, such as Gibbs sampler. How to specify background frequencies and pseudo-counts in computing PWM? What are their effects on the PWM outcome? How to control for Type I error rate involving multiple comparisons by using false discovery rate? All these topics are detailed in this chapter. The second part of the chapter covers perceptron which is the simplest artificial neural network with only a single neuron. Its function is equivalent to two-group discriminant function analysis in multivariate statistics, i.e., to identify nucleotide or amino acid sites that provide the greatest discriminant power between two sets of sequences. The algorithm of perceptron as well as its application and limitations are illustrated in detail.

Full Text