Abstract

Aggregate data arises in situations where survey research or other means of collecting individual- level data are either infeasible or inefficient. The recent increasing use of aggregate data in the statistical and allied fields - including epidemiology, education and social sciences - has arisen due to number of reasons. These include the questionable reliability of estimates when sensitive information required, the imposition of strict confidentiality policies on data by government and other organisational bodies and in some contexts it impossible to collect the information that needed. In this paper we present a novel approach to quantify the statistical significance of the extent of association that exists between two dichotomous variables when only the aggregate data available. This achieved by examining a newly developed index, called the aggregate association index (or the AAI), developed by Beh (2008 and 2010) which enumerates the overall extent of association about individuals that may exist at the aggregate level when individual level data not available. The applicability of the technique demonstrated by using leukaemia relapse data of Cave et al. (1998). This data presented in the form of a contingency table that cross-classifies the follow up status of leukaemia relapse by whether cancer traces were found (or not) on the basis of polymerase child reaction (PCR) - a modern method used to detect cancerous cells in the body assumed superior than conventional for that period, microscopic identification. Assuming that the joint cell frequencies of this table are not available, and that the only available information contained in the aggregate data, we first quantify the extent of association that exists between both variables by calculating the AAI. This index shows that the likelihood of association high. As the AAI has been developed by exploiting Pearson's chi-squared statistics, the AAI inherently suffers from the well-known large sample size effect that can overshadow the true nature of the association shown in the aggregate data of a given table. However, in this paper we show that the impact of sample size can be isolated by generating a pseudo population of 2x2 tables under the given sample size. Therefore, the focus of this paper to present an approach to help answer the question is this high AAI value statistically significant or not? by using aggregate data only. The answer to this question lies we believe, in the calculation of the p-value of the nominated index. We shall present a new method of numerically quantifying the p-value of the AAI thereby gaining new insights into the statistical significance of the association between two dichotomous variables when only aggregate level information available. The pseudo p-value approach suggested in this paper enhances the applicability of the AAI and thus can be considered a valuable addition to the literature of aggregate data analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.