Abstract

ABSTRACTData with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of overdispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly because they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. Supplementary materials for this article are available online.

Highlights

  • Multivariate count data abound in modern application areas such as genomics, sports, imaging analysis, and text mining

  • We examine regression models for multivariate counts with more flexible mean-covariance and correlation structure

  • We propose a unifying framework, the iteratively reweighted Poisson regression (IRPR), for the MLE of the four regression models

Read more

Summary

Introduction

Multivariate count data abound in modern application areas such as genomics, sports, imaging analysis, and text mining. The multinomial model is limiting due to its specific mean-variance structure and the implicit assumption that individual counts in the response vector are negatively correlated. We examine regression models for multivariate counts with more flexible mean-covariance and correlation structure. Parameter estimation in these models is typically hard because they do not belong to the exponential family. We propose a unifying iteratively reweighted Poisson regression (IRPR) method for the maximum likelihood estimation. IRPR is stable, scalable to high dimensional data, and simple to implement using existing software. Our methods are implemented in the R package and Matlab toolbox mglm (Zhang and Zhou, 2015)

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call