Abstract
SummaryWith the constant rise of data volumes in many disciplines, various new Big data management systems have emerged to provide scalable tools for efficient data integration, processing, and analysis. In this article, we provide an overview of biomedical data integration systems focusing on ontology‐based semantic systems and Big data technologies based systems such as Apache Spark. We also propose a new semantic data integration system, called Integrated Proteomics Data System (IPDS), which uses a mediator approach. IPDS provides users a unified interface for query processing and data exploration. This system takes advantage of the Apache Spark framework to perform the query transformation and execution needed to question the integrated data sources. We develop a domain ontology that allows the user to formulate its queries in terms defined in the ontology. IPDS is a case study of semantic proteomics data integration linking four data sources UniProt (protein annotation), String (protein‐protein interaction), PDB (protein structure), and Pubmed (biomedical citation).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Concurrency and Computation: Practice and Experience
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.