Abstract
Probit and logistic regression models are members of the family of generalized linear models, used for estimating the functional relationship between the dichotomous dependent and independent variables. The current study is designed to find the performance of logistic and probit regression models in different conditions under multivariate normality. The objective of the study is to compare the performance of probit and logistic regression models under multivariate normal. A Monte Carlo simulation study was done in which artificial datasets were generated under multivariate normality. Datasets were generated by employing the latent variable approach, under different variance-covariance matrices, varying sample sizes and prevalences. For each of the combinations, 1000 simulations were carried out. Probit and logistic regression analyses were performed and compared using parameter estimates, standard error, Likelihood Ratio test, RMSEs, null and residual deviances, different pseudo R2 measures, AIC, BIC and Correct Percent Prediction. A live data set was also used to compare the efficiency of the models. It was evident from AIC, BIC and RMSE values that logit and probit models fit the dataset equally well in all the combinations of sample size, correlation structure and proportion of outcome. However, sensitivity, specificity and CPP values showed that the logit model predicts the outcome better than the probit model in most of the situations. The results showed that the probit and logit models perform equally well under multivariate normality.
Highlights
IntroductionMany times continuous variables will be categorized and is known as discretization
Most of the outcome variables in Biomedical research are categorical in nature
The Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) values were almost same in logit (AIClogit, BIClogit: 209.76, 231.91) and probit (AICprobit, BICprobit: 210.10, 232.26) models indicating equal fit of both the models to the data set
Summary
Many times continuous variables will be categorized and is known as discretization. The items in the questionnaire are scored on 3 point to 5 point scale. The subject is considered as normal if the HRSD score is between 0 and 7 and there is evidence of depression otherwise (Hamilton, 1960). In such a scenario where the outcome variable is dichotomous, binary logistic and probit regression models are the frequently used statistical methods for predicting the outcome variable based on a set of independent variables (Chai and Draxler, 2014)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.