Abstract
BackgroundChronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration.ResultsWe present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney.ConclusionsThe KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain’s ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself.AvailabilityThe KUPKB may be accessed via http://www.e-lico.eu/kupkb.
Highlights
The early detection and better understanding of renal disease is important as it will reach pandemic proportions over the few decades [1]
Availability: The KUP Knowledge Base (KUPKB) may be accessed via http://www.e-lico.eu/kupkb
This paper presents a case-study for developing a knowledge base around a focused domain in the life sciences, namely the kidney and urinary pathway (KUP)
Summary
The early detection and better understanding of (chronic) renal disease is important as it will reach pandemic proportions over the few decades [1]. The biologist’s goal in renal disease is to understand the pathological processes and identify disease biomarkers This requires the analyses of experimental data from multiple biological levels (e.g. genes, proteins and metabolites). These data need to be integrated with existing knowledge from databases and the scientific literature to connect the different levels. Developing new resources that integrate existing data typically involves centralising the external data within new bespoke schemas This ‘warehousing’ approach is common in the life sciences and over time leads to an increasing number of resources, each with their own schema [7]. The Bio2RDF project [32] provides a repository of public databases that can be downloaded in RDF These two efforts provided some of the core datasets for the background biological knowledge represented in the KUPKB. Given that the experiments are being conducted on multiple species, we required data about orthologous genes that can be obtained from the Homologene database [60]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.