Correction to: Sparse correspondence analysis for large contingency tables
Correction to: Sparse correspondence analysis for large contingency tables
- Research Article
5
- 10.2312/pe/eurovast/eurova11/053-056
- Jan 1, 2011
We present the Contingency Wheel, a visual method for finding and analyzing associations in a large n m contingency table with m < 100 and n being two to three orders of magnitude larger than m. The method is demonstrated on a large table from the Book-Crossing dataset, which counts the number of ratings each book received from each country. It enables finding books that received a disproportionately high number of ratings from a specific country. It further allows to visually analyze what these books have in common, and with which countries they are also highly associated. Pairs of similar countries can further be identified (in the sense that many books are associated with both countries). Compared with existing visual methods, our approach enables analyzing and gaining insight into larger tables.
- Research Article
2
- 10.1111/j.2517-6161.1990.tb01772.x
- Sep 1, 1990
- Journal of the Royal Statistical Society Series B: Statistical Methodology
Discussion of the Papers by Edwards, and Wermuth and Lauritzen
- Book Chapter
5
- 10.1016/b978-012299045-8/50033-4
- Jan 1, 1998
- Visualization of Categorical Data
Chapter 29 - Analysis of Contingency Tables Using Graphical Displays Based on the Mixture Index of Fit
- Research Article
9
- 10.2307/2289283
- Sep 1, 1988
- Journal of the American Statistical Association
This article discusses through three examples several new methods to aid in the analysis of large contingency tables. The general goal is to give better understanding of specific contingency tables, both by comparing how various log-linear/logistic models fit and through clearer interpretations of the resulting fits. For model selection, we show how to focus on a subset of simple, good-fitting models, beginning with a plot of a goodness-of-fit statistic versus residual degrees of freedom for all of the fitted models. To assess whether a particular model is adequate, we demonstrate that certain plots of residuals can reveal interesting effects that are often otherwise hidden. For model summarization and interpretation, we plot odds-ratio factors with confidence intervals to show the effects of explanatory variables in a concise and appealing way. The first example involves the relationship of job satisfaction to demographic variables for craft employees of a large corporation. The data presented consist of a five-way contingency table with about 10,000 counts. Job satisfaction for such employees increased with age and was higher in the Southwest and West than in the Northeast. Of four race-by-sex groups, the most satisfied was nonwhite males; the least satisfied was nonwhite females. Another example gives a six-way table with about 1,200 counts concerning whether or not high-school students think they will need mathematics in their future work. Among other results, for students planning to take a job right after graduation, those from suburban schools had odds about 2.6 times those from urban schools of thinking that mathematics will be useful. Moreover, among urban students, males had odds of finding mathematics useful about 2.1 times those for females, but there was little difference between the odds for males and females among suburban students. The third example, drawn from the literature, relates knowledge about cancer to four dichotomous variables. We compare our analysis with earlier ones.
- Research Article
31
- 10.1080/01621459.1988.10478640
- Sep 1, 1988
- Journal of the American Statistical Association
This article discusses through three examples several new methods to aid in the analysis of large contingency tables. The general goal is to give better understanding of specific contingency tables, both by comparing how various log-linear/logistic models fit and through clearer interpretations of the resulting fits. For model selection, we show how to focus on a subset of simple, good-fitting models, beginning with a plot of a goodness-of-fit statistic versus residual degrees of freedom for all of the fitted models. To assess whether a particular model is adequate, we demonstrate that certain plots of residuals can reveal interesting effects that are often otherwise hidden. For model summarization and interpretation, we plot odds-ratio factors with confidence intervals to show the effects of explanatory variables in a concise and appealing way. The first example involves the relationship of job satisfaction to demographic variables for craft employees of a large corporation. The data presented c...
- Research Article
47
- 10.2307/2529344
- Mar 1, 1976
- Biometrics
In many practical situations, investigators are forced to study the structure underlying the crossclassification of several categorical variables via tables of observed counts in which the observations corresponding to certain sets of cells are indistinguishable. Methods are presented for the analysis of such contingency tables with incompletely cross-classified data via loglinear models. The method of maximum likelihood is used to estimate the expected cell counts which are then used to test the goodness-of-fit of the model. Extensions to incomplete (or truncated) contingency tables are indicated and several examples are given.
- Research Article
95
- 10.1016/j.jspi.2007.03.022
- Mar 31, 2007
- Journal of Statistical Planning and Inference
The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but in fact it extends back at least into the 19th century. Moreover, it remains an active area of research today. In this paper we give an overview of this history focussing on the development of log-linear models and their estimation via the method of maximum likelihood. Roy played a crucial role in this development with two papers co-authored with his students, Mitra and Marvin Kastenbaum, at roughly the mid-point temporally in this development. Then we describe a problem that eluded Roy and his students, that of the implications of sampling zeros for the existence of maximum likelihood estimates for log-linear models. Understanding the problem of non-existence is crucial to the analysis of large sparse contingency tables. We introduce some relevant results from the application of algebraic geometry to the study of this statistical problem.
- Research Article
70
- 10.1093/biomet/70.3.553
- Jan 1, 1983
- Biometrika
Some aspects of the analysis of multidimensional contingency tables in practice are considered. The class of graphical models defined by Darroch, Lauritzen & Speed (1980) is described, strategies for model selection based on this class are considered and an example is given.
- Research Article
67
- 10.2307/2528967
- Mar 1, 1972
- Biometrics
Several authors have recently considered the analysis of contingency tables containing cells which are missing, a priori zero, or otherwise specified. Such tables are usually referred to as being incomplete. This paper reexamines this recent literature and shows how the methodology can be extended to the analysis of incomplete multi-way cross-classifications. Several examples are given, and the methods developed here are examined in the light of these examples. The emphasis is on the use of techniques for the actual analysis of data and on the ties with the analysis of complete multi-way tables.
- Research Article
2
- 10.1002/sim.2447
- Dec 12, 2005
- Statistics in Medicine
Rationalization of antibiotic therapy in the management of infectious diseases is helped by a knowledge of the patterns of sensitivity and resistance of bacteria to antibiotics and their possible changes both in time and from one hospital unit to another. In this paper we present the results regarding the sensitivities of several groups of bacteria and different Units of the S.Orsola-Malpighi Hospital of Bologna in the period 1995-1997. We apply recent methods of analysis of ordinal contingency tables that rely on stochastic ordering of the rows to test the assumption that a decrease (or increase) in sensitivity of bacteria to specific antibiotics has taken place against the alternative that no such thing has happened. In most cases the results seem to indicate an increase in sensitivity rather than what was expected, namely the opposite.
- Research Article
5
- 10.1007/s10463-007-0153-1
- Oct 3, 2007
- Annals of the Institute of Statistical Mathematics
Analysis of large dimensional contingency tables is rather difficult. Fienberg and Kim (1999, Journal of American Statistical Association, 94, 229–239) studied the problem of combining conditional (on single variable) log-linear structures for graphical models to obtain partial information about the full graphical log-linear model. In this paper, we consider the general log-linear models and obtain explicit representation for the log-linear parameters of the full model based on that of conditional structures. As a consequence, we give conditions under which a particular log-linear parameter is present or not in the full model. Some of the main results of Fienberg and Kim follow from our results. The explicit relationships between full model and the conditional structures are also presented. The connections between conditional structures and the layer structures are pointed out. We investigate also the hierarchical nature of the full model, based on conditional structures. Kim (2006, Computational Statistics and Data Analysis, 50, 2044–2064) analyzed graphical log-linear models based on conditional log-linear structures, when a set of variables is conditioned. For this case, we employ the Mobius inversion technique to obtain the interaction parameters of the full log-linear model, and discuss their properties. The hierarchical nature of the full model is also studied based on conditional structures. This result could be effectively used for the model selection also. As applications of our results, we have discussed several typical examples, including a real-life example.
- Research Article
34
- 10.1016/0304-4076(83)90099-4
- May 1, 1983
- Journal of Econometrics
Loglinear models and categorical data analysis with psychometric and econometric applications
- Research Article
4
- 10.1093/imaman/9.3.241
- Jul 1, 1998
- IMA Journal of Management Mathematics
Graphical models simplify the analysis of multivariate observations by summarizing conditional independences in the data. Variables are represented by nodes, and the absence of an edge between two nodes signifies their conditional independence. While graphical modelling has been used in several applications of statistics, credit scoring has only recently been suggested as a suitable candidate. This paper suggests the following potential uses for graphical models: to display and interpret the associations between variables taken from a credit-card application form; to compare the credit scoring of subpopulations; to give a description of the credit-scoring selection process in terms of influence diagrams; and to assess the effect of selection bias and stratification on the interdependency of variables. These methods are discussed in relation to the analysis of a subset of variables from a stratified sample of credit-card applicants. The large number of variables measured in an application form requires the statistical analysis of large sparse contingency tables. It is shown here that tractable graphical models can be extracted from fitting the relatively simple all-two-way interaction model.
- Research Article
58
- 10.1007/s00127-002-0569-0
- Sep 1, 2002
- Social Psychiatry and Psychiatric Epidemiology
This study tests the hypothesis that gender differences in depression diminish after menopause (around the age of 55). Methods Using the 1994 National Population Health Survey, we examine the relationship between age and gender on major depressive disorder in relation to sociodemographic and social covariates using contingency table analyses and multivariate logistic regression. Contingency table and multivariate analyses identify significantly higher rates of depression among women before and after the age period associated with menopause. A series of multivariate analyses controlling for a broad array of social factors also does not lead to any convergence in differences of rates of depression between males and females. Hormone replacement therapy (HRT) does not have a significant impact on these observed relationships. These findings are at odds with a recent study that has identified menopause as a point where gender differences in depression diminish. Further research is required to address this inconsistency.
- Research Article
19
- 10.1007/s102110100050
- Feb 1, 2002
- Acta ethologica
In this paper we present ACTUS2, the second version of ACTUS (Analysis of Contingency Tables Using Simulation). ACTUS2 has many new features, including analysis of data in which dependencies that make some combinations of properties impossible are hypothesized. Because ACTUS2 explicitly simulates such hypotheses, it can be used without loss of accuracy to analyze small amounts of data in large tables with many zeros or very low frequencies. We illustrate these features with two studies of animal behavior: interactions of male individuals with other individuals in groups of captive, mature Triturus marmoratus pygmaeus (newts); and agonistic interactions between pairs of male juvenile Diplodus sargus (the sparid fish, white sea-bream). Both significantly frequent, and significantly infrequent, co-occurrences that had biologically meaningful interpretations were revealed.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.