Abstract

In the last decade, pharmacogenomics has emerged as an active and productive discipline, drawing on both pharmacology and genomics. With powerful high-throughput technologies, we are discovering rich connections between genes, drugs, variants and phenotypes. The productivity of pharmacogenomics research has led to a rapid expansion of the associated scientific literature, widely dispersed among numerous journals and difficult to track. An important set of challenges thus arises: how to track new findings? How to recall known connections? How to identify gaps in understanding? Recent advances in the technology of text mining promise to help. Knowledge reported in the literature is often embedded in dormant PDF documents, disconnected from related information and underlying data [1]. The reality, of course, is that the entities of interest (drugs, genes, model systems, experimental methods, phenotypes and so on) are interconnected in intricate ways, via pathways and networks often too complex to commit to memory. Text mining can connect these entities, and provides us with the opportunity to assess our understanding by viewing a connected map of entities. The benefit is clear; an accurate knowledge map would assist researchers, save their time and allow them to read less, cover more and (most importantly) generate hypotheses to bring to the lab. In the context of pharmacogenomics, the key entities are drugs, gene variants and phenotypes. Knowledge of these entities and the interactions between them provides answers to associated fundamental questions: ‘which gene products metabolize a drug?’, ‘what drug response phenotypes are affected by a particular gene?’ or ‘which gene variant decreases response to drug X?’ This knowledge also forms the basis for generating candidate gene lists for important phenotypes, and for detecting unexpected drug–drug interactions. The vision of a computer-based system to answer such questions was a far-off fantasy a decade ago. However, text mining and natural language processing techniques, particularly those tailored to biomedical text, have matured in recent years, and we can now contemplate a continually updated and automatically generated comprehensive view of pharmacogenomic knowledge. How has text mining advanced? Text mining is the process whereby information is extracted from human-generated natural language text using informatics algorithms. Thus, unstructured text is structured into a machine-readable format. The text mining task consists of several key steps: document retrieval, entity recognition (also known as ‘named entity recognition’), and relationship extraction (either simple term co-occurrence or more advanced semantic understanding). The last decade has shown great progress in each of these steps. The automation of these tasks provides a method that scales with the double-exponential growth of Medline [2], which currently adds an additional two papers every minute [3].

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.