Reversing global biodiversity loss will require transformational human actions and robust measurements of their effectiveness. Diversity assessment using environmental DNA (eDNA) has emerged as a cutting-edge technique with the potential to address the challenges of measuring biodiversity. Vast amounts of eDNA sequences and eDNA-based species detections are generated in scientific studies. These datasets are typically stored in a variety of different repositories in multiple formats, hindering their reuse (Berry et al. 2020). Ensuring the publication of eDNA data following the FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al. 2016) would revolutionise environmental assessment, including monitoring of biodiversity, individual species, and interactions across extensive spatial and temporal scales, and generate critical knowledge for evidence-based management. Archiving FAIR eDNA data requires standardising data formats and vocabularies, cyberinfrastructures, guidelines, data sharing policy, and collaboration among scientists and institutions. Some of these requirements are addressed by existing data standards and infrastructures, including Darwin Core (DwC) (Wieczorek et al. 2012), Minimum Information about any (x) Sequence (MIxS) (Yilmaz et al. 2011), the Global Biodiversity Information Facility (GBIF) network, and International Nucleotide Sequence Database Collaboration (INSDC) partners (Arita et al. 2020). However, multiple challenges remain and FAIR data practices have yet to be established among the eDNA community. This is partly because critical attributes unique to eDNA data are not adequately accommodated by existing standards. For example, monitoring contamination and excluding non-target taxa, and the parameters used for quality filtering and species detection vary greatly between studies, depending on the study scopes and the associated financial and ecological costs of incorrectly inferring presence or absence. Making such information FAIR is needed for future studies reusing data and requiring high confidence levels in species detection and taxonomic assignment. Furthermore, the procedures of targeted-taxon detection approach (e.g., interpretations of quantitative polymerase chain reaction (qPCR) results in detecting the presence of DNA from individual taxa) have not yet been fully captured by existing standards. Increasing efforts have been made to establish minimum reporting requirements to validate eDNA study methods and data (Klymus et al. 2020, Thalinger et al. 2021). These requirements need further development to be translated into data standards and formats to enhance machine readability and reusability, and to support and guide the eDNA community, so that they are effectively utilised. In this talk, we share our best practice guide for formatting and publishing eDNA data, developed by an international multidisciplinary working-group comprising eDNA researchers, journal editors, and biodiversity and omics data scientists. We identified required data types, formats and metadata checklists through reviewing and integrating existing data standards, devising subject-specific vocabularies, and introducing additional terms to accommodate the distinctive properties of eDNA data. Implementing the FAIR eDNA data best practice guide, offers a pivotal step towards standardising and enhancing the publication and re-use of eDNA data.
Read full abstract