Abstract
BackgroundBiological databases and repositories are incrementing in diversity and complexity over the years. This rapid expansion of current and new sources of biological knowledge raises serious problems of data accessibility and integration. To handle the growing necessity of unification, CellBase was created as an integrative solution. CellBase provides a centralized NoSQL database containing biological information from different and heterogeneous sources. Access to this information is done through a RESTful web service API, which provides an efficient interface to the data.ResultsIn this work we present PyCellBase, a Python package that provides programmatic access to the rich RESTful web service API offered by CellBase. This package offers a fast and user-friendly access to biological information without the need of installing any local database. In addition, a series of command-line tools are provided to perform common bioinformatic tasks, such as variant annotation. CellBase data is always available by a high-availability cluster and queries have been tuned to ensure a real-time performance.ConclusionPyCellBase is an open-source Python package that provides an efficient access to heterogeneous biological information. It allows to perform tasks that require a comprehensive set of knowledge resources, as for example variant annotation. Queries can be easily fine-tuned to retrieve the desired information of particular biological features. PyCellBase offers the convenience of an object-oriented scripting language and provides the ability to integrate the obtained results into other Python applications and pipelines.
Highlights
Biological databases and repositories are incrementing in diversity and complexity over the years
CellBase has been used in applications for variant prioritization [10] and it is used for variant annotation in the 100,000 Genomes Project [11]
PyCellBase has been used as an annotation tool in variant prioritization and variant curation pipelines
Summary
Biological databases and repositories are incrementing in diversity and complexity over the years. CellBase provides a centralized NoSQL database containing biological information from different and heterogeneous sources Access to this information is done through a RESTful web service API, which provides an efficient interface to the data. The increase in scientific knowledge due to the massive data production from high-throughput technologies have caused an unprecedented growth in the number and size of databases storing relevant biological data [1]. These annotations are fragmented among many resources that range greatly in terms of capacity, scope and organization (e.g., Ensembl [2], UniProt [3], and Reactome [4]). CellBase has been used in applications for variant prioritization [10] and it is used for variant annotation in the 100,000 Genomes Project [11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.