Extraction of Tumor Site from Cancer Pathology Reports using Deep Filters

Abhishek K Dubey,Jacob Hinkle,J Blair Christian,Georgia Tourassi

doi:10.1145/3307339.3342173

Abstract

\beginabstract Purpose: Pathology reports are the primary source of information concerning the millions of cancer cases across the United States. % Cancer registries manually process the pathology reports to extract the pertinent information including primary tumor site, behavior, histology, laterality, and grade. % Processing a large volume of the pathology reports in a timely manner is a continuing challenge for cancer registries. % The purpose of this study is to develop an information extraction pipeline to reliably and efficiently extract reportable information. Method: % We have developed a novel inverse-regression (IR) based information extraction pipeline. % The inverse-regression based supervised filter has been successfully applied to many application domains. % However, its application to the information extraction from unstructured text is hindered primarily by the extreme high-dimensionality of n-gram representations of text. % In this study, we attempt to overcome the obstacles by a novel bootstrapping strategy. % First, we use an information-theoretic mutual information based filter to discard the excessive and redundant n-gram features. % This step reduces the size and improves the condition number of the sample covariance matrix, thus reducing the computational cost and improving the numerical stability of the subsequent inverse-regression step. % Then we use localized sliced inverse-regression (LSIR) to learn a low-dimensional discriminatory subspace for information inference. % In particular, we use the k-nearest neighbors of an unlabeled pathology report in the learned representation to infer the desired information from the labeled data in a supervised manner. % % Results: The experiments were conducted on a set of de-identified pathology reports with human expert labels as the ground truth. % Our pipeline consistently performed better than or comparable to the best performing state-of-the-art methods while reducing the training and inference times substantially. Conclusion: Our results demonstrate the potential of \emergencystretch 3em inverse-regression based information extraction pipeline for reliable and efficient information extraction from unstructured text. % The information extracted from the pathology reports can be used along with clinical information, medical imaging, and genomic information to instigate discoveries in cancer research. % \endabstract

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extraction of Tumor Site from Cancer Pathology Reports using Deep Filters

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Programming techniques for improving rule readability for rule-based information extraction natural language processing pipelines of unstructured and semi-structured medical texts
Nektarios Ladas ... Alina Rehberg
Health Informatics Journal | VOL. 29
Nektarios Ladas, et. al.Nektarios Ladas ... Alina Rehberg
01 Apr 2023
Health Informatics Journal | VOL. 29

Inverse Regression for Extraction of Tumor Site from Cancer Pathology Reports
Abhishek K Dubey ... Georgia D Tourassi
-
Abhishek K Dubey, et. al.Abhishek K Dubey ... Georgia D Tourassi
01 May 2019
01 May 2019

Building a generic debugger for information extraction pipelines
Anish Das Sarma ... Alpa Jain
-
Anish Das Sarma, et. al.Anish Das Sarma ... Alpa Jain
24 Oct 2011
24 Oct 2011

Identifying Associations between Somatic Mutations and Clinicopathologic Findings in Lung Cancer Pathology Reports.
Sophie Deharvengt ... Saeed Hassanpour
Methods of information in medicine | VOL. 57
Sophie Deharvengt, et. al.Sophie Deharvengt ... Saeed Hassanpour
01 Feb 2018
Methods of information in medicine | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extraction of Tumor Site from Cancer Pathology Reports using Deep Filters

Abstract

Talk to us

Similar Papers