Abstract
Accurate TNM staging plays an important role in the diagnosis, treatment, and prognosis of lung cancer. In current clinical practice, the staging of lung cancer is usually decided by physicians. We aim to develop an automated lung cancer staging system using machine learning and verify the staging correctness. In this work, we constructed a feature generalizing and automatically extracting model using NLP techniques. The parameters required for Tumor (T), Lymph nodes (N) and Metastases (M) categories of the eighth edition of the International Lung Cancer Research Association (IASLC) TNM staging system were automatically extracted from de-identified electronic medical records of pathology, operation note, CT scan, PET/CT scan, cranial MRI, bone scan, and ultrasound. A technical solution using Bayesian reasoning network was developed for automated staging. The stage was automatically predicted while the reasoning basis was given. All the reports were reviewed by thoracic surgeons to obtain the gold standard for evaluation. Five hundred de-identified reports were collected as training dataset to construct the model by learning from stage given by physicians. Five hundred and thirteen de-identified reports were collected as validation dataset. The current overall recall rate was 96.88%, and the agreement rate between machine prediction and physicians's diagnosis was 93.70%. Natural language processing is a useful technique for encoding medical reports in order to detect the TNM descriptors. Automatic lung cancer staging process using Bayesian reasoning network achieve acceptable accuracy. This system is extendable and can be applied to large database processing.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have