Abstract

Translational research is a growing field of science that seeks to discover the molecular underpinnings of diseases and treatment outcomes in any individual patient (Horig, Marincola et al. 2005). The mission has driven researchers out of isolated and disciplineoriented studies into collaborative and trans-disciplinary research efforts known as team science (Guimera, Uzzi et al. 2005). In this new scientific arena, the ability to search for an individual’s biomedical data across various domains and sources via a common computational platform is a vital component for the formulation of sophisticated hypotheses and research decisions. Biomedical data is composed of records from both clinical practice and basic research. Each sector has distinct data governance policies and database management rules. While basic biological research data sources are open— some 1,230 curated databases are available in the public domain and accessible through the Internet (Cochrane and Galperin 2010), all primary clinical data sources are kept private with rigorous data access controls, due to Health Insurance Portability and Accountability Act (HIPAA) regulations (Faddick 1997). Furthermore, while basic biological research data sources frequently make data elements, database schemas, metadata information and application programming interface (API) available to the public, the majority of clinical data sources are hosted by proprietary commercial software. The vendors (or developers) of these tools usually disclose little information about schema and metadata to third-parties. Finally, while most basic research (e.g., biological molecule or pathway) data sources must have data integrity at the species level, translational research requires data integrity at the individual patient level. Indeed, integrated and individualized biomedical data sources will need to make a significant contribution to translational research in order to truly achieve personalized medicine. However, generating such data sources is a more difficult task than the already challenging mission of integrating basic biological research data (Stein 2008). Data integration is the process of combining data from different sources into a unified format with consistent description and logical organization. After more than two decades of research, the topic continues to become more challenging due to increasing demands and persistent obstacles (Batini, Lenzerini et al. 1986; Bernstein and Haas 2008; Agrawal, Ailamaki et al. 2009). In this chapter, we focus on the issues that must be addressed to fulfil the demands for individualized biomedical data integration and introduce a customized warehousing approach for this particular purpose.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call