Abstract

Abstract Poisson distribution and multinomial distribution are natural distribution models for count and frequency data for categorical variables. When there are only two categories, the multinomial reduces to the binomial distribution. The distributions are skewed with a variance that changes with the mean value, and therefore standard methods for normally distributed variables such as ANOVA (analysis of variance) and the general linear model cannot reasonably be applied. When events are counted and classified by two or more categorical variables or when sample units are classified, the counts or frequencies are often presented in tabular form in so‐called contingency tables. A general framework for analyzing association in multidimensional contingency tables is provided by the log‐linear models. When a set of nested models is formulated, higher‐order interaction terms may be removed successively in analogy with ANOVA models for normally distributed variables. A subset of these models, the logit models, may be used for assessing the effect of explanatory variables on a categorical response variable. Another important subset of the log‐linear models is constituted by the graphical models. These models allow for a factorization of the joint probability into a product of conditional distributions. The log‐linear and logit models are special cases of a wider class of models, the generalized linear models. In this framework, the effect of continuous explanatory variables may be modeled in regression‐like models. An example is the logistic regression where the effect of explanatory variables on a response probability is modeled. When the assumption of independence is violated, overdispersion might result. A method for modeling overdispersion is also presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call