Abstract

Focusing on point-scale random variables, i.e. variables whose support consists of the first m positive integers, we discuss how to build a joint distribution with pre-specified marginal distributions and Pearson’s correlation rho . After recalling how the desired value rho is not free to vary between -1 and +1, but generally ranges a narrower interval, whose bounds depend on the two marginal distributions, we devise a procedure that first identifies a class of joint distributions, based on a parametric family of copulas, having the desired margins, and then adjusts the copula parameter in order to match the desired correlation. The proposed methodology addresses a need which often arises when assessing the performance and robustness of some new statistical technique, i.e. trying to build a huge number of replicates of a given dataset, which satisfy—on average—some of its features (for example, the empirical marginal distributions and the pairwise linear correlations). The proposal shows several advantages, such as—among others—allowing for dependence structures other than the Gaussian and being able to accommodate the copula parameter up to an assigned level of precision for rho with a very small computational cost. Based on this procedure, we also suggest a two-step estimation technique for copula-based bivariate discrete distributions, which can be used as an alternative to full and two-step maximum likelihood estimation. Numerical illustration and empirical evidence are provided through some examples and a Monte Carlo simulation study, involving the CUB distribution and three different copulas; an application to real data is also discussed.

Highlights

  • Datasets arising in the social sciences often contain ordinal variables

  • There are several statistical models and techniques that can be employed for handling multivariate ordinal data without trying to quantify their ordered categories. (The review by Liu and Agresti (2005) and the later textbook of Agresti (2010) give a thorough treatment.) Among them, correlation models and association models both study departures from independence in contingency tables and involve the assignment of scores to the categories of the row and column variables in order to maximize the relevant measure of relationship

  • We remind nonlinear principal component analysis (NLPCA), which is a special case of a multivariate reduction technique named homogeneity analysis and which can be usefully applied in customer satisfaction surveys (Ferrari and Manzi 2010) for mapping the observed ordinal variables into a one-dimensional quantitative variable

Read more

Summary

Introduction

Datasets arising in the social sciences often contain ordinal variables. Sometimes they are genuine ordered assessments (judgements, preferences, degree of liking of a product or adhesion to a sentence, etc.), whereas in other circumstances they are discretized or categorized for convenience (age of people in classes, education achievement, levels of blood pressure, etc.) The former situation often arises when a survey is administered to a group of people being studied, e.g. questionnaires submitted by a company to their customers with the aim of assessing their level of satisfaction towards a product or service the company has provided. Describing a real phenomenon by creating mirror images and imperfect proxies of the (partially) unknown underlying population in a repeated manner allows researchers to study the performance of their statistical methods through simulated data replicates that mimic the real data characteristics of interest in any given setting (Demirtas and Yavuz 2015; Demirtas and Vardar-Acar 2017) This is often necessary since exact analytic results are seldom available for finite sample sizes, and simulation is required to assess the reliability, validity, and plausibility. 3, we first state the problem of finding a joint probability function with assigned margins and correlation in general terms; we focus on a particular class of joint distributions, recalling how to build copula-based bivariate discrete distributions; we describe the proposed procedure for inducing a desired value of correlation between two point-scale variables.

Attainable correlations between two random variables
Statement of the problem
Generating bivariate discrete distributions having the pre‐specified margins
The Gauss copula
The Plackett copula
Extension to multivariate context
Pseudo‐random simulation
Application to CUB random variables
Inferential aspects
Full maximum likelihood
Two‐step maximum likelihood
Monte Carlo study
Empirical analysis
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.