Bayesian models identify specific lymphoproliferative disorders associated with hepatitis C virus infection

Raffaele Bruno,Francesco Passamonti,Cesare Astori,Luca Arcaini,Silvia Rizzi,Ester Orlandi,Michele Merli,Marco Paulli,Sara Rattotti,Mario Lazzarino,Cristiana Pascutto

doi:10.1002/ijc.24162

Abstract

Hepatitis C virus (HCV) infection is a disease with a worldwide diffusion and a significant social impact. The infection is endemic in some countries such as Japan, Italy and Egypt. HCV plays an etiological role in cirrhosis and hepatocellular carcinoma. HCV has been linked to the pathogenesis of type II mixed cryoglobulinemia.1 The role of HCV infection in lymphomagenesis may be related to chronic antigenic stimulation of HCV, similarly to that reported for Helicobacter pylori infection in development of gastric mucosa-associated lymphoid tissue (MALT) lymphoma.2 The causal role of HCV in lymphomagenesis is supported by the regression of lymphoma after eradicating the infection.3-5 In a recent meta-analysis of case-control studies,6 the pooled relative risk of lymphoma among HCV carriers was 2.5. The series were too small to allow the identification of specific HCV-related histotypes. In the Epilymph7 case-control study, diffuse large B-cell lymphoma (DLBCL) was the subtype most clearly associated with HCV infection. HCV infection also seemed to increase the risk of marginal zone lymphoma and lymphoplasmacytoid lymphoma, but these results are based on few cases (for instance 1 HCV-positive patient among 41 cases of splenic marginal zone lymphoma). To provide clear estimates of the risk of non-Hodgkin's lymphoma subtypes, the International Lymphoma Epidemiology Consortium (InterLymph) made a pooled case-control study (4,784 cases of NHL and 6,269 controls).8 HCV infection was detected in 172 NHL cases (3.60%) and in 169 (2.70%) controls. In subtype-specific analyses, HCV prevalence was associated with marginal zone lymphoma (OR 2.47), DLBCL (OR 2.24) and lymphoplasmacytic lymphoma (OR 2.57). Recent retrospective studies report a high HCV seroprevalence among patients with marginal zone lymphomas9-13 and diffuse large B cell lymphoma.14-16 We applied a Bayesian approach in order to estimate the prevalence of HCV infection in the different lymphoproliferative disorders and to investigate and eventually highlight the presence of specific HCV-related entities. We analyzed 1,434 patients with HCV serology available among 2,621 patients with lymphoproliferative disorders diagnosed between 1992 and 2006 at the Division of Hematology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy. Median age at diagnosis was 51 years (range, 16–90) and male/female ratio was 0.9. Antibodies to HCV were detected at the initial staging by second- or third-generation ELISA test. Approval for this retrospective study, based on archival data, was obtained from the local Institutional Ethical Committee. Data management and analysis were performed in accordance to the Helsinki Declaration of 1975, revised in 1983 and 2000. For homogenizing disease nomenclature, we adopted the classification of lymphoid neoplasms for epidemiologic research proposed by the Pathology Working Group of the International Lymphoma Epidemiology Consortium (InterLymph).17 In addition we included all cases of non-CLL leukemic lymproliferative disorders with Royal Marsden Hospital scoring system ≤318 under the term of B-cell chronic lymphoproliferative disorders (B-CLPD).19, 20 According to the classical statistical approach, hypothesis testing is performed to compare the group-specific maximum likelihood (ML) estimates of prevalence. This procedure may lead to unsatisfactory results when the data are scarce (great influence of random variability in the estimates and lack of statistical power) and/or when there are several subgroups (multiple testing issues). The Bayesian approach allows to filter out random variability in the estimates by introducing a hierarchical structure with random effects and also allows to bypass the problem of multiple testing, because of the possibility of performing direct comparisons of parameters by calculating posterior probabilities and credible intervals. We obtained estimates of HCV prevalence in each type of lymphoproliferative disorder from 4 statistical models. Model A: the classical ML estimator for the binomial distribution, with the corresponding 95% confidence intervals (95% CI). Model B: a Bayesian fixed-effects model, based on the assumption that the number of HCV+ patients in each group is an independent binary response variable with group-specific HCV prevalence as failure probability. A noninformative prior distribution was specified on each failure probability. Model C: a fully Bayesian hierarchical model, based on the assumption that all the (logistic transformed) prevalence rates are random effects drawn from the same normally distributed prior distribution, so that the prior mean can be interpreted as an “overall prevalence.” The mean and the standard deviation of the prior distribution are two hyperparameters on which noninformative hyperpriors have been specified. Model D: a fully Bayesian hierarchical model, in which the (logistic transformed) prevalence of each diagnosis group has been assigned one of two normal priors, with different mean and standard deviation parameters, each with the same noninformative hyperprior as for Model C. Parameter estimates were calculated from the Markov Chain Monte Carlo (MCMC) samples drawn from the posterior distribution of the parameters. Point estimates are the average of the posterior samples. Interval estimates, namely the 95% Credible Intervals (CI), are the 2.5th and the 97.5th percentiles of the samples from the posterior distribution. Posterior probabilities (PP) are calculated as the proportion of posterior samples for which the tested hypothesis is verified. The Deviance Information Criterion (DIC)21 was adopted as a method for model comparison. DIC is the sum of 2 components: a goodness of fit measure and a model complexity measure. The model with the smallest DIC can be considered the “best” model, i.e., the model that would best predict a replicate dataset with the same structure as the one currently observed. All Bayesian computations were carried out using WinBUGS 1.4.3 © Imperial College and MRC (UK) available at: http://www.mrc-bsu.cam.ac.uk/bugs and Microsoft Excel 2000 ©. For each Bayesian model, the MCMC algorithm was run for 200,000 iterations after convergence.22, 23 The estimates of HCV seroprevalence according to the 4 models are summarized in Table I and plotted in Figure 1. The point estimates and the 95% CIs obtained from Model B are overlapping to those obtained with the classic ML approach (Model A). The estimated prevalence ranged from 5% for Hodgkin's lymphoma and hairy cell leukemia to more than 30% for the following histotypes: DLBCL (43%), MALT lymphoma (44%) and splenic MZL (34%). The estimates of the group-specific prevalences obtained with Model C were slightly pulled toward the overall mean value (p0 in Table I; central point in dark grey band in Fig. 1). However, it was still possible to point out histotypes with high HCV seroprevalence (DLBCL 43%, MALT lymphoma 43%, splenic MZL 33%; low-grade NHL NOS 47%). In these groups, the 95% CI had little overlapping (or no overlapping at all) with the credible interval of the overall prevalence p0. Furthermore, in each of these subgroups the PP that HCV prevalence was greater or equal to p0 exceeded 90%, while it was lower that 50% in all other histotypes. The MCMC method allowed to rank the group-specific prevalence at each iteration and to calculate their median rank. The high-prevalence histotypes had a median rank between 8 and 11 out of 11, while the median rank of the lower prevalence subgroups ranged between 1 and 6. The posterior probabilities that the group-specific prevalence exceeds p0 and the median ranking of prevalence obtained from Model C are detailed in Table I. The median ranks of the posterior prevalence allowed to easily identify a high- and a low-prevalence subset, median rank less than or equal to 6/11 and median rank above 6/11, respectively. HCV seroprevalence point estimates and corresponding 95% CI in the different histotypes according to the fitted models. Model D allowed to further highlight the existence of high and low HCV-seroprevalence disorders. The estimated overall HCV seroprevalence of the 2 subsets of lymphoma (high-prevalence vs. low-prevalence) had nonoverlapping 95% CIs (Fig. 1, light grey bands), and the estimates in each lymphoma subtype were pulled toward the overall prevalence of the corresponding subset (p1 and p2 in Table I; midpoints of light grey bands in Fig. 1); this is particularly evident for splenic MZL. In this study, the ML Model A and the Bayesian fixed-effects Model B lead to almost identical estimates. This allows to properly evaluate the effect of implementing the subsequent Bayesian models. Models C and D differ from Models A and B because they have a hierarchical structure on the group-specific prevalence. In Model C there is the underlying hypothesis that the subgroups of lymphoproliferative disorders are not completely independent from one another regarding HCV seroprevalence. To formalize this hypothesis, group-specific prevalence are modeled as drawn from the same probability distribution, with unknown mean and variance. This allows to filter the random variability out of the estimates, with the result of pulling them toward an overall mean (central) value. Despite the “shrinkage” of the estimates toward a central value, the computation of the median rank of each group-specific prevalence and of the PP that each prevalence is higher than the overall mean allowed to highlight a subset of high-prevalence histotypes. Finally, in Model D we modeled 2 separate hierarchical structures, one for the high-prevalence subgroup and the other for the remaining lower-prevalence histotypes. As a result, histotype-specific estimates were pulled toward the central mean of the respective subgroup. The DIC method for model choice pointed out Model D as the “best” one (Table I). Since Model D allows for a “two population” hypothesis, it also highlights the main clinical result, i.e., the existence of 2 subpopulations with different HCV prevalence. Bayesian hierarchical models are a flexible tool that allows to build complex models in which our prior beliefs and acquisitions are formalized and integrated in the analysis. In this study, the Bayesian statistical approach identified DLBCL and marginal zone lymphomas as lymphoid disorders with a significantly higher HCV-seroprevalence. Interestingly, these findings are similar to those presented by InterLymph study8 in which the selection of cases and controls had followed very strict epidemiological criteria. In addition, within the ample spectrum of marginal zone lymphomas, our study was able to identify splenic MZL and extranodal MZL of MALT as entities significantly related to HCV infection. Yours sincerely, Luca Arcaini, Cristiana Pascutto, Francesco Passamonti, Raffaele Bruno, Michele Merli, Silvia Rizzi, Ester Orlandi, Cesare Astori, Sara Rattotti, Marco Paulli and Mario Lazzarino

Full Text