Large-scale combining signals from both biomedical literature and the FDA Adverse Event Reporting System (FAERS) to improve post-marketing drug safety signal detection

Rong Xu,Quanqiu Wang

doi:10.1186/1471-2105-15-17

Abstract

BackgroundIndependent data sources can be used to augment post-marketing drug safety signal detection. The vast amount of publicly available biomedical literature contains rich side effect information for drugs at all clinical stages. In this study, we present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles.ResultsThe datasets are comprised of 4,285,097 records from FAERS and 21,354,075 MEDLINE articles. We first extracted all drug-side effect (SE) pairs from FAERS. Our study implemented a total of seven signal ranking algorithms. We then compared these different ranking algorithms before and after they were boosted with signals from MEDLINE sentences or abstracts. Finally, we manually curated all drug-cardiovascular (CV) pairs that appeared in both data sources and investigated whether our approach can detect many true signals that have not been included in FDA drug labels. We extracted a total of 2,787,797 drug-SE pairs from FAERS with a low initial precision of 0.025. The ranking algorithm combined signals from both FAERS and MEDLINE, significantly improving the precision from 0.025 to 0.371 for top-ranked pairs, representing a 13.8 fold elevation in precision. We showed by manual curation that drug-SE pairs that appeared in both data sources were highly enriched with true signals, many of which have not yet been included in FDA drug labels.ConclusionsWe have developed an efficient and effective drug safety signal ranking and strengthening approach We demonstrate that large-scale combining information from FAERS and biomedical literature can significantly contribute to drug safety surveillance.

Highlights

Post-marketing drug safety signal detection from spontaneous reporting systems is challenging, demands new types of data, and calls for new avenues for advancing the state-of-the-art in data mining approaches
We show that the precision of Named entity recognition (NER) using the original MedDRA-based lexicon is 0.84, and the precision using the clean lexicon is 1.000
Compared to drug side effect detection using signals from FDA Adverse Event Reporting System (FAERS) alone, our approach by combining signals from both FAERS and MEDLINE significantly improved the performance

Summary

Introduction

Post-marketing drug safety signal detection from spontaneous reporting systems is challenging, demands new types of data, and calls for new avenues for advancing the state-of-the-art in data mining approaches. Mining drug-side effect (drug-SE) associations from the prominent spontaneous reporting system, the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS), is a highly active research area. We present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles. Harpaz et al recently reviewed the data mining and machine learning approaches to discovering adverse drug events from FAERS [2]. Data mining algorithms such as disproportionality analysis, correlation analysis, and multivariate regression have been developed to detect adverse drug signals from FAERS [1,2,3,4]. Health information available on the web and web search log data can provide valuable information on drug side effects [16,17]

Methods

Results

Discussion

Conclusion