Abstract

Literature reports of adverse drug events can be replicated across multiple companies, resulting in extreme duplication (defined as a majority of reports being duplicates) in the FDA Adverse Event Reporting System (FAERS) database because they can escape legacy duplicate detection algorithms routinely deployed on that data source. Literature reference field, added to in 2014, could potentially be utilized to identify replicated reports. FAERS does not enforce adherence to the Vancouver referencing convention, thus the same article may be referenced differently leading to duplication. The objective of this analysis is to determine if variations of the same literature references observed in FAERS can be resolved with text normalization and fuzzy string matching. We normalized the literature references recorded in the FAERS database through the first quarter of 2021 with a rule-based algorithm so that they better conform to the Vancouver convention. Levenshtein distance was then utilized to merge sufficiently similar normalized literature references together. Normalization of literature references increases the percentage that can be parsed into author, title, and journal from 61.74% to 93.93%. We observe that about 98% of pairs within groups do have a Levenshtein similarity of the title above the threshold. The extreme duplication ranged from 66% to 87% with a median of 72% of reports being duplicates and often involved addictovigilance scenarios. We have shown that these normalized references can be merged via fuzzy string matching to improve enumeration of all the individual case safety reports that refer to the same article. Inclusion of the PubMed ID and adherence to the Vancouver convention could facilitate identification of duplicates in the FAERS dataset. Awareness of this phenomenon may improve disproportionality analysis, especially in areas such as addictovigilance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.