Abstract

Our aim is to characterize the statistical distribution of the number of involved lymph nodes in breast cancer. The material uses a sample of 109618 women from the US SEER (Surveillance, Epidemiology, and End Results). In a first analysis, we observed a log-concave distribution with overdispersion which excluded a Poisson stochastic process. A Negative Binomial (NB) provided an acceptable fit. Overdispersion implies that there are patients who are more at risk than expected, and/or cascade processes in which the variability increases when there are more involved lymph nodes. In a second series of analyses, we applied predictive models taking into account or not the NB. Logistic models, commonly used, allow only the prediction of nodal status, and we found a poor predictive value. A NB generalized linear regression (NBGLR) allowed us to model the number of involved nodes. We argued that the approach of modeling the number of nodes, and not merely the nodal status, allows a grading of nodal involvement risk and might identify patients for whom neoadjuvant treatment would be justified. Incidentally, the NBGLR found in our sample a seasonal factor affecting the numbers of nodes, suggesting the variability of medical practice, which might warrant further investigation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call