Abstract

Count data often have excess zeros in many clinical studies. These zeros usually represent disease-free state. Although disease (event) free at the time, some of them might be at a of having the putative outcome while others may be at or no such risk. We postulate these zeros as a one of the two types, either as 'low risk' or as 'high risk' zeros for the disease process in question. Low zeros can arise due to the absence of factors for disease initiation/progression and/or due to very early stage of the disease. High zeros can arise due to the presence of significant factors for disease initiation/ progression or could be, in rare situations, due to misclassification, more specific diagnostic tests, or below the level of detection. We use zero inflated models which allows us to assume that zeros arise from one of the two separate latent processes-one giving low-risk zeros and the other high-risk zeros and subsequently propose a strategy to identify and classify them as such. To illustrate, we use data on the number of involved nodes in breast cancer patients. Of the 1152 patients studied, 38.8% were node- negative (zeros). The model predicted that about a third (11.4%) of negative nodes are high risk and the remaining (27.4%) are at low risk of nodal positivity. Posterior probability based classification was more appropriate compared to other methods. Our approach indicates that some node negative patients may be re-assessed for their diagnosis about nodal positivity and/or for future clinical management of their disease. The approach developed here is applicable to any scenario where the disease or outcome can be characterized by count-data.

Highlights

  • Classification of individuals as such who are at high-risk of certain outcome is an important goal in clinical practice and research (Lewis, 2000)

  • We demonstrated that the Zero Inflated Negative Binomial (ZINB) model fit and described the data well with number of involved nodes as outcome

  • ZINB model predicted that 38.6% negative nodes and 27.4% as low risk negative nodes in the data set

Read more

Summary

Introduction

Classification of individuals as such who are at high-risk of certain outcome is an important goal in clinical practice and research (Lewis, 2000). Count outcome data often occur in medical research for example, number of days with physical activities, number of adverse cardiac events, number of recurrences, number of attacks, number of seizure in epilepsy, number of hospital admissions, number of alcoholic drinks consumed etc. Data collected on such outcomes often have excess of zeros (negative outcomes, no disease, or no event) (Slymen et al, 2006). Excess zeros in count outcome data may occur due to presence of more subjects with no risk of event of interest Studies refer to such zeros as structural zeros and zeros that are at risk of event in question are referred to as sampling zeros. Regardless of the types of zeros, there is a need to account for the data heterogeneity due to excess zeros in drawing appropriate inferences and predictions while modeling count data

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call