Abstract

As part of its core business of gathering population-based information on new cancer diagnoses, the Belgian Cancer Registry receives free-text pathology reports, describing results of (pre-)malignant specimens. These reports are provided by 82 laboratories and written in 2 national languages, Dutch or French. For breast cancer, the reports characterize the status of estrogen receptor, progesterone receptor, and Erb-b2 receptor tyrosine kinase 2. These biomarkers are related with tumor growth and prognosis and are essential to define therapeutic management. The availability of population-scale information about their status in breast cancer patients can therefore be considered crucial to enrich real-world scientific studies and to guide public health policies regarding personalized medicine. The main objective of this study is to expand the data available at the Belgian Cancer Registry by automatically extracting the status of these biomarkers from the pathology reports. Various types of numeric features are computed from over 1,300 manually annotated reports linked to breast tumors diagnosed in 2014. A range of popular machine learning classifiers, such as support vector machines, random forests and logistic regressions, are trained on this data and compared using their F1 scores on a separate validation set. On a held-out test set, the best performing classifiers achieve F1 scores ranging from 0.89 to 0.92 for the four classification tasks. The extraction is thus reliable and allows to significantly increase the availability of this valuable information on breast cancer receptor status at a population level.

Highlights

  • The Belgian Cancer Registry (BCR) is a population-based cancer registry collecting information on all new cancer diagnoses in Belgium, covering incidences at the population level since 2004 [1, 2]

  • The current study presents the first attempt at automatic extraction of estrogen receptor (ER), progesterone receptor (PR), and HER2 status from free-text Dutch and French pathology reports in Belgium

  • This work presents a method for automatic extraction of breast receptor biomarker status from free-text pathology reports, using machine-learning classifiers

Read more

Summary

Introduction

The Belgian Cancer Registry (BCR) is a population-based cancer registry collecting information on all new cancer diagnoses in Belgium, covering incidences at the population level since 2004 [1, 2]. Every oncological care program in the hospitals and every laboratory for pathological anatomy is required by law to register structured information on new (pre-)malignant cases. This structured information covers a range of patient In addition to producing statistics on the cancer burden in Belgium [2], the BCR is involved in several prospective clinical registration projects, the evaluation of quality of care in oncology, the registration of all cyto-histological specimens in the context of screening programs for breast, colorectal and cervical cancer, and in several research projects In addition to producing statistics on the cancer burden in Belgium [2], the BCR is involved in several prospective clinical registration projects, the evaluation of quality of care in oncology, the registration of all cyto-histological specimens in the context of screening programs for breast, colorectal and cervical cancer, and in several research projects (http://kankerregister. org/Publications)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call