Abstract

Abstract Introduction: Being able to characterize mutations for both pathogenicity and drug response is indispensable to the analysis of tumor genomics and the development of therapeutic options. While a great deal of data has been deposited in various structured, genomic databases, a large portion of insights are primarily and often times solely found in biomedical literature. Medline contains about 26 million literature citations; a number that is unrealistic for a human to read. Thus machine based approaches are needed to comprehensively capture the landscape of reported mutations. Method: An automated pattern matching method is utilized to extract mutations from Medline abstracts as presented in Human Genome Variation Society (HGVS) format and RefSNPs (rs) number. A typical HGVS protein mutation is described as [reference amino acid][position][new amino acid], as in p.His1047Arg, His1047Arg, or simply H1047R in HGVS format. This method identifies and consolidates all mentioned protein mutations and their alternate formulations. Result: Over 300,000 unique abstract-mutation pairs were identified including 90,000 unique mutations. Well known cancer mutations such as BRAF V600E, JAK2 V617F and EGFR L858R are among the most frequent appearing in oncology literature. At the other end, 51,000 mutations are mentioned in just a single abstract, 16,000 mutations in two abstracts, 7,600 in three abstracts, and so forth. Conclusion: The number of mutations appearing in Medline abstracts represents just a small portion of the 2 million unique coding mutations contained in the COSMIC database. While we expect the actual coverage of mutations by literature to be more comprehensive if this approach is extended to the full text body, the number would likely remain small compared with the total reported COSMIC mutations. One of the great challenges in oncology is characterizing variants of unknown significance (VUS), and by first extracting all reported mutations, even those mentioned in only one article, and their specific biological context, we can begin to identify broader patterns in mutations’ pathogenicity and their impact on drug response. Citation Format: Takahiko Koyama, Kahn Rhrissorrakrai, Laxmi Parida. A survey of mutations in biomedical literature using a machine based approach [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3574. doi:10.1158/1538-7445.AM2017-3574

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call