Semantic mashup of biomedical data

Kei-Hoi Cheung

doi:10.1016/j.jbi.2008.08.003

Abstract

As the diversity and quantity of Web-accessible data in the biomedical domain grow, there are increasing benefits in empowering end-user scientists, working on their own, to integrate the various sources of data. Traditionally, significant programming effort has been required to parse and integrate heterogeneous datasets prior to enabling scientists to answer interesting questions. The heterogeneity includes different data formats, information models, and terminologies. Recently, a new breed of Web-based data-integration tools has been developed to simplify this process. They are called “mashups.” These mashup tools have been designed to empower end-users to be able to extract, format, and remix data across multiple Web sites. Examples of such tools include Dapper (http://www.dapper.net/), which allows users to extract/scrape data from Web pages visually and to produce the extracted data as feeds in formats such as Rich Site Summary (RSS) (http://web.resource.org/rss/1.0/spec); Google Maps (http://maps.google.com), which provides the ability to mashup (integrate) datasets in the Keyhole Markup Language (KML) format and to visualize the integrated results; and Yahoo! Pipes (http://pipes.yahoo.com/pipes/), which provides operators/widgets to mashup heterogeneously formatted datasets (e.g., tabular, RSS, and KML formats). In addition to accessing user-friendly mashup tools, Web programmers can directly use open Web APIs, such as those listed in ProgrammableWeb (http://www.programmableweb.com/). Mashup tools have been designed to allow disparate data sources to be brought together to increase utility to end-users. However, even with the tools and open APIs, users must perform most of the system integration. There is a need for creating mashups that better enable computers to help people achieve more powerful and complex data integration involving semantic mappings across multiple information models, terminologies, and ontologies. The term for such machine-based integration of data is “semantic mashups.” The transition to semantic mashups is made possible using Semantic Web technology (http://www.w3.org/2001/sw/), which facilitates the sharing of the meaning of data. This in turn makes it much easier to combine the stovepipe systems and to integrate data in new and unexpected ways. The key components of the Semantic Web include RDF as the basic data model, OWL for expressive ontologies, and SPARQL for query. This special issue highlights the transition from mashups to semantic mashups in the context of biomedicine. At the American Medical Informatics Association’s Annual Symposium in 1998 (AMIA98), Sir Tim Berners-Lee gave the keynote speech on the role of the Web in the information-intensive era of health care and biomedical research. In his speech, Berners-Lee envisioned the transition of the Web from being human-oriented to being increasingly machine-friendly. This burgeoning vision of the machine-friendly Web later became the Semantic Web vision. Since the seminal publication on the Semantic Web in Scientific American in 2001 [1], the Semantic Web has progressed from being a vision to reality [2], although we still have some way to go before reaching the most futuristic aspects of the original Scientific American article. Adoption of the Semantic Web has been especially evident within health care and life sciences. In part, this has been driven by the World Wide Web Consortium (W3C), which created an interest group focused on the application of the Semantic Web to this domain area (http://www.w3.org/2001/sw/hcls/). The group has been chartered to develop and support the use of Semantic Web technologies and practices to improve collaboration, research and development, and innovation adoption in health care and the life sciences. Increased adoption has been observed in the form of increasing numbers of academic papers, special issues in journals (e.g., [3]), books (e.g., [4]), and conferences (e.g., [5]). An increasing number of implementations within commercial enterprises have also been documented (http://www.w3.org/2001/sw/sweo/public/UseCases/). The annual World Wide Web (WWW) conference is one of the world’s largest meetings for Web researchers, practitioners, and developers. A workshop titled “Health Care and Life Sciences Data Integration for the Semantic Web” (http://www2007.org/workshop-W2.php) was co-located with the WWW2007 conference. While Berners-Lee’s AMIA keynote speech introduced the nascent vision of the Semantic Web to the biomedical informatics community, the workshop at WWW2007 provided concrete examples of how both academic and commercial organizations are embracing the technology. A number of the papers in this special issue of JBI originated at, and are expanded from, the workshop, while other papers were selected from submissions responding to the issue’s public call for papers. The aim of this special issue is to raise awareness of the benefits of using Semantic Web technology for data integration within health care and life sciences. The following section outlines the organization of this special issue and gives a brief introduction to the papers.

Full Text