Abstract
Currently, gene information available for Oryza sativa species is located in various online heterogeneous data sources. Moreover, methods of access are also diverse, mostly web-based and sometimes query APIs, which might not always be straightforward for domain experts. The challenge is to collect information quickly from these applications and combine it logically, to facilitate scientific research. We developed a Python package named PyRice, a unified programing API to access all supported databases at the same time with consistent output. PyRice design is modular and implements a smart query system, which fits the computing resources to optimize the query speed. As a result, PyRice is easy to use and produces intuitive results. https://github.com/SouthGreenPlatform/PyRice. Supplementary data are available at Bioinformatics online.
Highlights
Rice, a model crop plant, is a major cereal grain widely consumed by a large part of the world’s human population, especially in Asia
Information of Oryza sativa genes are published on several open-access databases using different gene annotation models, e.g. RAPDB (Hiroaki et al, 2013), MSU7 or dedicated IDs (i.e. SNP-SEEK (Mansueto et al, 2016) and IC4R (IC4R Project Consortium et al, 2016)
PyRice manages a dictionary of ID mapping across databases since each uses either of the two systems RAPDB and MSU7 (e.g. LOC_Os01g01010 = Os01g0100100; while the first ID is from MSU7 and the second is from RAPDB)
Summary
A model crop plant, is a major cereal grain widely consumed by a large part of the world’s human population, especially in Asia. Many digital resources have been developed in rice genomics. Compared to human genomics, not as much centralized resources and analysis tools are available for rice genomics. Most of the information currently available are scattered and patchy in nature. For scientists, the challenge lies in integrating data and finding useful information. In the scope of the project, we aim to build an API to solve the problem of collecting and managing gene and gene products information from different sources. The PyRice package is developed to run remote queries over ten databases and web applications so far. PyRice uses parallel processing to improve query speed. It indexes results for a fast search and supports exporting results into different formats
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.