New laboratory technologies such as DNA microarrays have made it possible to measure the expression levels of thousands of genes simultaneously in a particular cell or tissue. The challenge for genetic epidemiologists will be to develop statistical and computational methods that are able to identify subsets of gene expression variables that classify and predict clinical endpoints. Linear discriminant analysis is a popular multivariate statistical approach for classification of observations into groups. This is because the theory is well described and the method is easy to implement and interpret. However, an important limitation is that linear discriminant functions need to be prespecified. To address this limitation and the limitation of linearity, we have developed symbolic discriminant analysis (SDA) for the automatic selection of gene expression variables and discriminant functions that can take any form. In the present study, we demonstrate that SDA is capable of identifying combinations of gene expression variables that are able to classify and predict autoimmune diseases.
Read full abstract