Standardized pipelines support and facilitate integration of diverse datasets at the Rat Genome Database.

Jennifer R Smith,Marek A Tutaj,Jyothi Thota,Logan Lamers,Adam C Gibson,Akhilanand Kundurthi,Varun Reddy Gollapally,Kent C Brodie,Stacy Zacher,Stanley J F Laulederkind,G Thomas Hayman,Shur-Jen Wang,Monika Tutaj,Mary L Kaldunski,Mahima Vedi,Wendy M Demos,Jeffrey L De Pons,Melinda R Dwinell,Anne E Kwitek

doi:10.1093/database/baae132

Jennifer R Smith, Marek A Tutaj + Show 17 more

https://doi.org/10.1093/database/baae132

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The Rat Genome Database (RGD) is a multispecies knowledgebase which integrates genetic, multiomic, phenotypic, and disease data across 10 mammalian species. To support cross-species, multiomics studies and to enhance and expand on data manually extracted from the biomedical literature by the RGD team of expert curators, RGD imports and integrates data from multiple sources. These include major databases and a substantial number of domain-specific resources, as well as direct submissions by individual researchers. The incorporation of these diverse datatypes is handled by a growing list of automated import, export, data processing, and quality control pipelines. This article outlines the development over time of a standardized infrastructure for automated RGD pipelines with a summary of key design decisions and a focus on lessons learned.

Full Text