Abstract
BackgroundTransfer learning is a common practice in image classification with deep learning where the available data is often limited for training a complex model with millions of parameters. However, transferring language models requires special attention since cross-domain vocabularies (e.g. between two different modalities MR and US) do not always overlap as the pixel intensity range overlaps mostly for images.MethodWe present a concept of similar domain adaptation where we transfer inter-institutional language models (context-dependent and context-independent) between two different modalities (ultrasound and MRI) to capture liver abnormalities.ResultsWe use MR and US screening exam reports for hepatocellular carcinoma as the use-case and apply the transfer language space strategy to automatically label imaging exams with and without structured template with > 0.9 average f1-score.ConclusionWe conclude that transfer learning along with fine-tuning the discriminative model is often more effective for performing shared targeted tasks than the training for a language space from scratch.
Highlights
Hepatocellular carcinoma (HCC) is the most common primary liver malignancy, and is the fastest-rising cause of cancer-related deaths in the United States [1]
We explore recent language modeling methodologies that overcome these limitations: (1) context-independent word embedding models (Word2vec [11], Glove [12]), where the language model (LM) learns numerical representation of words regardless of where the word occurs in a sentence, and (2) context-dependent word embedding models (BERT [13], ELMo [14]), which capture the context of words – that is, it’s position in a sentence
This trend is true for both language models (Word2Vec and better than the context-dependent transformer models (BERT)) as well as both classifiers (Random Forest and 1DCNN)
Summary
Hepatocellular carcinoma (HCC) is the most common primary liver malignancy, and is the fastest-rising cause of cancer-related deaths in the United States [1]. Different versions of LI-RADS are available for each of the HCC screening imaging modalities (including US and MRI). LI-RADS standardized reporting systems, and the use of structured HCC-screening imaging report templates, help facilitate information extraction from imaging reports, enabling the creation of large-scale annotated databases that can be used for clinical outcomes and machine learning research. Despite these efforts, adoption of structured reports remains low [5], and simple tasks like differentiating between benign and malignant exams from a large number of screening cases, requires hundreds of hours of manual review by the radiologist. Transferring language models requires special attention since cross-domain vocabularies (e.g. between two different modalities MR and US) do not always overlap as the pixel intensity range overlaps mostly for images
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.