Abstract

BackgroundOne of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand. This is known as the peaking phenomenon, a common situation in the analysis of gene expression data. Highly predictive bivariate gene interactions whose marginals are useless for discrimination are also affected by such phenomenon, so they are commonly discarded by state of the art sequential search algorithms. Such patterns are known as weak/marginal strong bivariate interactions. This paper addresses the problem of uncovering them in high dimensional settings.ResultsWe propose a new approach which uses the quadratic discriminant analysis (QDA) as a search engine in order to detect such signals. The choice of QDA is justified by a simulation study for a benchmark of classifiers which reveals its appealing properties. The procedure rests on an exhaustive search which explores the feature space in a blockwise manner by dividing it in blocks and by assessing the accuracy of the QDA for the predictors within each pair of blocks; the block size is determined by the resistance of the QDA to peaking. This search highlights chunks of features which are expected to contain the type of subtle interactions we are concerned with; a closer look at this smaller subset of features by means of an exhaustive search guided by the QDA error rate for all the pairwise input combinations within this subset will enable their final detection. The proposed method is applied both to synthetic data and to a public domain microarray data. When applied to gene expression data, it leads to pairs of genes which are not univariate differentially expressed but exhibit subtle patterns of bivariate differential expression.ConclusionsWe have proposed a novel approach for identifying weak marginal/strong bivariate interactions. Unlike standard approaches as the top scoring pair (TSP) and the CorScor, our procedure does not assume a specified shape of phenotype separation and may enrich the type of bivariate differential expression patterns that can be uncovered in high dimensional data.

Highlights

  • One of the drawbacks we face up when analyzing gene to phenotype associations in genomic data is the ugly performance of the designed classifier due to the small sample-high dimensional data structures (n ≪ p) at hand

  • In this paper we address the problem by evaluating the performance of a classification rule trained on the data at hand; hopefully, this will enrich the typology of interactions that might be hidden in the data

  • One of the main drawbacks for facing up this problem is the well known peaking phenomenon; it consists of the deterioration of the performance of the designed classifier when the number of inputs increases and many noisy variables are involved in fitting the classifier, so the signal gets masked and the classifier confuses it with the noise

Read more

Summary

Results

We propose a new approach which uses the quadratic discriminant analysis (QDA) as a search engine in order to detect such signals. The procedure rests on an exhaustive search which explores the feature space in a blockwise manner by dividing it in blocks and by assessing the accuracy of the QDA for the predictors within each pair of blocks; the block size is determined by the resistance of the QDA to peaking. This search highlights chunks of features which are expected to contain the type of subtle interactions we are concerned with; a closer look at this smaller subset of features by means of an exhaustive search guided by the QDA error rate for all the pairwise input combinations within this subset will enable their final detection. When applied to gene expression data, it leads to pairs of genes which are not univariate differentially expressed but exhibit subtle patterns of bivariate differential expression

Conclusions
Background
Methods
E57 E150 E128 E29 V2
Results and discussion
G82 G83 G84 G85 G86
Conclusion
11. McLachlan GJ
14. Ripley BD
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call