Abstract

BackgroundIn the era of datafication, it is important that medical data are accurate and structured for multiple applications. Especially data for oncological staging need to be accurate to stage and treat a patient, as well as population-level surveillance and outcome assessment. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. A rule-based algorithm to classify T-stage was trained and validated on, respectively, 200 and 225 English free-text radiological reports from diagnostic computed tomography (CT) obtained for staging of patients with lung cancer. The automated T-stage extracted by the algorithm from the report was compared to manual staging. A graphical user interface was built for training purposes to visualize the results of the algorithm by highlighting the extracted concepts and its modifying context.ResultsAccuracy of the T-stage classifier was 0.89 in the validation set, 0.84 when considering the T-substages, and 0.76 when only considering tumor size. Results were comparable with the Dutch results (respectively, 0.88, 0.89 and 0.79). Most errors were made due to ambiguity issues that could not be solved by the rule-based nature of the algorithm.ConclusionsNLP can be successfully applied for staging lung cancer from free-text radiological reports in different languages. Focused introduction of machine learning should be introduced in a hybrid approach to improve performance.

Highlights

  • In the era of datafication, it is important that medical data are accurate and structured for multiple applications

  • An example is a Dutch rule-based natural language processing (NLP) algorithm that can extract the T-stage for lung cancer according to the tumor node metastasis (TNM) oncology classification system from the free-text radiological reports of chest computed tomography (CT) scans [7, 8]

  • Corpus description After institutional review board approval at the participating medical center, an existing retrospective lung cancer clinical database of patients treated at the institution was used to search for radiological reports of diagnostic CT or positron emission tomography-computed tomography (PET-CT) scans, performed at initial cancer staging

Read more

Summary

Introduction

In the era of datafication, it is important that medical data are accurate and structured for multiple applications. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. An example is a Dutch rule-based NLP algorithm that can extract the T-stage for lung cancer according to the tumor node metastasis (TNM) oncology classification system from the free-text radiological reports of chest computed tomography (CT) scans [7, 8]. It may speed up workflow and enhance the quality and accuracy of the radiological report, as well as communication between health professionals

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.