RareBERT: Transformer Architecture for Rare Disease Patient Identification using Administrative Claims

Pks Prakash,Srinivas Chilukuri,Shankar Viswanathan,Nikhil Ranade

doi:10.1609/aaai.v35i1.16122

Abstract

A rare disease is any disease that affects a very small percentage (1 in 1,500) of population. It is estimated that there are nearly 7,000 rare disease affecting 30 million patients in the U. S. alone. Most of the patients suffering from rare diseases experience multiple misdiagnoses and may never be diagnosed correctly. This is largely driven by the low prevalence of the disease that results in a lack of awareness among healthcare providers. There have been efforts from machine learning researchers to develop predictive models to help diagnose patients using healthcare datasets such as electronic health records and administrative claims. Most recently, transformer models have been applied to predict diseases BEHRT, G-BERT and Med-BERT. However, these have been developed specifically for electronic health records (EHR) and have not been designed to address rare disease challenges such as class imbalance, partial longitudinal data capture, and noisy labels. As a result, they deliver poor performance in predicting rare diseases compared with baselines. Besides, EHR datasets are generally confined to the hospital systems using them and do not capture a wider sample of patients thus limiting the availability of sufficient rare dis-ease patients in the dataset. To address these challenges, we introduced an extension of the BERT model tailored for rare disease diagnosis called RareBERT which has been trained on administrative claims datasets. RareBERT extends Med-BERT by including context embedding and temporal reference embedding. Moreover, we introduced a novel adaptive loss function to handle the class imbal-ance. In this paper, we show our experiments on diagnosing X-Linked Hypophosphatemia (XLH), a genetic rare disease. While RareBERT performs significantly better than the baseline models (79.9% AUPRC versus 30% AUPRC for Med-BERT), owing to the transformer architecture, it also shows its robustness in partial longitudinal data capture caused by poor capture of claims with a drop in performance of only 1.35% AUPRC, compared with 12% for Med-BERT and 33.0% for LSTM and 67.4% for boosting trees based baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

RareBERT: Transformer Architecture for Rare Disease Patient Identification using Administrative Claims

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 12

Similar Papers

"SHARE. CARE. CURE." – A EUROPEAN REFERENCE NETWORK FOR RARE INFECTIOUS DISEASES
Emilia Severin
Romanian Archives of Microbiology and Immunology | VOL. 80
Emilia SeverinEmilia Severin
31 Dec 2021
Romanian Archives of Microbiology and Immunology | VOL. 80

200th ENMC International Workshop “European Reference Networks: Recommendations and Criteria in the Neuromuscular field”, 18–20 October 2013, Naarden, the Netherlands
Teresinha Evangelista ... Kate Bushby
Neuromuscular Disorders | VOL. 24
Teresinha Evangelista, et. al.Teresinha Evangelista ... Kate Bushby
15 Mar 2014
Neuromuscular Disorders | VOL. 24

1438. Uptake of 13-Valent Pneumococcal Conjugate Vaccine in High-Risk Adults Aged 19–64 Years: A Kaplan–Meier Approach
Jeffrey Vietri ... Birol Emir
Open Forum Infectious Diseases | VOL. 5
Jeffrey Vietri, et. al.Jeffrey Vietri ... Birol Emir
26 Nov 2018
1438. Uptake of 13-Valent Pneumococcal Conjugate Vaccine in High-Risk Adults Aged 19–64 Years: A Kaplan–Meier Approach
Jeffrey Vietri ... Birol Emir

Spotlight on rare diseases
The Lancet Diabetes & Endocrinology
The Lancet Diabetes & Endocrinology | VOL. 7
The Lancet Diabetes & Endocrinology The Lancet Diabetes & Endocrinology
22 Jan 2019
The Lancet Diabetes & Endocrinology | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RareBERT: Transformer Architecture for Rare Disease Patient Identification using Administrative Claims

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence