DNA methylation and copy number may be associated with each other to some extent, in positive or negative ways. Whether differential methylation and copy number variation have combined effects on gene expression is largely unknown. We use a multivariate linear model to formulate the relationship among the three genomic measurements, gene expression, copy number, and methylation levels. We propose a method that combines a distance covariance measure and the group LASSO to analyze multiple types of genomic data collectively for the purpose of insightfully revealing how gene expression is potentially affected by both copy number variation and differential methylation levels in cellular process. Our approach is of two stages, the first is a variable screening process in which a variable selection method is utilized by employing the concept of joint distance covariance (JdCov) of random vectors, and the second is to implement a penalized regression approach, a group LASSO, on the screened data of much lower dimension. The two-stage approach is tested in extensive simulation studies and shown to be effective. The approach is then applied to the TCGA Melanoma data, which consists of gene expression, methylation and copy number measurements of more than 300 patients and relationship of some genes with methylation and copy number measurements were revealed for the involved subjects.
Read full abstract