Abstract The agriculture genomics community has numerous data submission standards available but little experience describing and storing single-cell (e.g., scRNAseq) data. Other single-cell genomics infrastructure efforts, such as the Human Cell Atlas Data Coordination Platform (HCA DCP), have resources that could benefit our community. For example, the HCA DCP is integrated with Terra, a cloud-native workbench for computational biology developed by Broad, Verily, and Microsoft that houses tools for scGenomics analysis. We will describe a pilot-scale project to determine if our current metadata standards for livestock and crops can be used to ingest scRNAseq datasets in a manner consistent with HCA DCP standards and if established resources (e.g., Terra) can be used to analyze the ingested data. Currently, the most comprehensive data ingestion portal for high throughput sequencing datasets from plants, fungi, protists, and animals/humans is Annotare (located at EMBL-European Bioinformatics Institute), ensures that sufficient metadata are collected to enable re-analysis and dissemination via the Single Cell Expression Atlas (SCEA). Annotare supports user-directed annotation and processing of their data, as well as search tools via the SCEA and transferred to the Galaxy analysis space. For animal datasets, another EMBL-EBI portal, the FAANG portal, has been developed that provides bulk and scRNAseq data access. scRNAseq data/metadata can be submitted to FAANG using a semi-automated process. We have extended this tool for scRNAseq data so that files can be validated using the HCA DCP metadata and data validation service. These files are also incorporated using EMBL-EBI’s HCA DCP ingestion service and transferred to Terra for further analysis. Once incorporated, datasets will augment the DCP resource for the scientific community. In an extension of these efforts, we have also created a Shiny-based web application, called Shiny-PIGGI, for the single cell-level transcriptomic study of pig immune tissues and peripheral blood mononuclear cells, which will be an important resource for improved annotation of porcine immune genes and cell types. The Shiny-PIGGI (https://shinypiggi.ansci.iastate.edu) is implemented completely in R, runs on any modern web browser, and requires no programming. This tool thus increases accessibility through eliminating technical training requirements for using Seurat object and related R packages commonly used in scRNAseq analysis. Our main goal was to develop an interactive web application that allows users such as animal scientists and immunologists to visualize and analyze biological datasets. This tool will be demonstrated at the meeting. We intend to further build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem to facilitate single cell-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species.
Read full abstract