A survey of biodiversity informatics: Concepts, practices, and challenges
Abstract The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision‐makers in ways that they can effectively use them. The development and deployment of tools and techniques to generate these indicators require having access to trustworthy data from biological collections, field surveys and automated sensors, molecular data, and historic academic literature. The transformation of these raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques applied to manage and analyze these data constitute an area usually called biodiversity informatics. Biodiversity data follow a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.This article is categorized under: Algorithmic Development > Biological Data Mining
- Research Article
- 10.3897/biss.2.25310
- May 18, 2018
- Biodiversity Information Science and Standards
Georeferencing helps to fill in biodiversity information gaps, allowing biodiversity data to be represented spatially to allow for valuable assessments to be conducted. The South African National Biodiversity Institute has embarked on a number of projects that have required the georeferencing of biodiversity data to assist in assessments for redlisting of species and measuring the protection levels of species. Data quality in biodiversity information is an important aspect. Due to a lack of standardisation in collection and recording methods historical biodiversity data collections provide a challenge when it comes to ascertaining fitness for use or determining the quality of data. The quality of historical locality information recorded in biodiversity data collections faces the scrutiny of fitness for use as these information is critical in performing assessments. The lack of descriptive locality information, or ambiguous locality information deems most historical biodiversity records unfit for use. Georeferencing should essentially improve the quality of biodiversity data, but how do you measure the fitness for use of georeferenced data? Through the use of the Darwin Core coordinateUncertaintyinMeters, georeferenced data can be queried to investigate and determine the quality of the georeferenced data produced. My presentation will cover the scope of ascertaining georeferenced data quality through the use of the DarwinCore term coordinateUncertatintyInMeters, the impacts of using a controlled vocabulary in representing the coordinateUncertaintyInMeters, and will highlight how SANBI’s georeferencing efforts have contributed to data quality within the management of biodiversity information.
- Research Article
1
- 10.3897/biss.7.112237
- Sep 7, 2023
- Biodiversity Information Science and Standards
Biodiversity data plays a pivotal role in understanding and conserving our natural world. As the largest occurrence data aggregator, the Global Biodiversity Information Facility (GBIF) serves as a valuable platform for researchers and practitioners to access and analyze biodiversity information from across the globe (Ball-Damerow et al. 2019). However, ensuring the quality of GBIF datasets remains a critical challenge (Chapman 2005). The community emphasizes the importance of data quality and its direct impact on the fitness of use for biodiversity research and conservation efforts (Chapman et al. 2020). While GBIF continues to grow in terms of the quantity of data it provides, the quality of these datasets varies significantly (Zizka et al. 2020). The biodiversity informatics community has been working diligently to ensure data quality at every step of data creation, curation, publication (Waller et al. 2021), and end-use (Gueta et al. 2019) by employing automated tools and flagging systems to identify and address issues. However, there is still more work to be done to effectively address data quality problems and enhance the fitness of use for GBIF-mediated data. I highlight a missing component in GBIF's data publication process: the absence of formal peer reviews. Despite GBIF encompassing the essential elements of a data paper, including detailed metadata, data accessibility, and robust data citation mechanisms, the lack of peer review hinders the credibility and reliability of the datasets mobilized through GBIF. To bridge this gap, I propose the implementation of a comprehensive peer review system within GBIF. Peer reviews would involve subjecting GBIF datasets to rigorous evaluation by domain experts and data scientists, ensuring the accuracy, completeness, and consistency of the data. This process would enhance the trustworthiness and usability of datasets, enabling researchers and policymakers to make informed decisions based on reliable biodiversity information. Furthermore, the establishment of a peer review system within GBIF would foster collaboration and knowledge exchange among the biodiversity community, as experts provide constructive feedback to dataset authors. This iterative process would not only improve data quality but also encourage data contributors to adhere to best practices, thereby elevating the overall standards of biodiversity data mobilization through GBIF.
- Book Chapter
- 10.1016/b978-0-323-95502-7.00135-4
- Jan 1, 2024
- Reference Module in Life Sciences
Biodiversity Informatics
- Research Article
1
- 10.3897/biss.3.37373
- Jun 26, 2019
- Biodiversity Information Science and Standards
Biodiversity informatics (BI) plays an important role in helping us know, protect and use biodiversity sustainably. It encompasses activities from data digitization, standardization, sharing and aggregation, to supporting decision and policy making. In a country like Brazil, with a large continental geographic area containing ca. 15% of the planet’s biodiversity, the challenge is even greater: stakeholders are widely distributed over a large country and the amount of data is huge. Brazil has been a part of the international BI community, including Biodiversity Information Standards (TDWG), for around two decades. Initially represented solely by the Centro de Referência em Informação Ambiental (CRIA), gradually other groups from universities, museums and institutions joined the arena. Despite the broader group of stakeholders now involved, the local community is not strong enough. From a human resources point of view, the country has very good universities that train competent professionals both in information technology (IT) and in biology or related fields. Concerning the IT professionals, not surprisingly, other industries and job opportunities are usually more attractive and few people even know about BI. This is probably not unique to Brazil. Biological sciences professionals, for their part, usually have little literacy in computing and are equally unaware of BI as a field. On the institutional level, museums, universities and other biological data owners often lack IT support for biological data management, including digitization, and systems development/maintenance. This may reflect the lack of appreciation of the importance of data and of BI as a foundation for good biodiversity science and management. The same happens when it comes to funding. Biological collections are not adequately funded and lack more than a few episodic programs to support collection and museum maintenance and digitization. This lack of infrastructural funding highlights the tragedies of the fires at the Butantan Museum in 2010 (80,000 snakes, 180,000 spiders) and the Museu Nacional in 2018, (20 million biological specimens and objects of the Brazilian and world history and art were lost). The exception is the São Paulo Research Foundation (FAPESP), which has been supporting projects since 1999 on biodiversity and BI via its successful Biota-FAPESP program, the first to tie biodiversity projects to data digitization and sharing in Brazil. The lack of institutional engagement and support, and funding affects the sustainability of many initiatives and puts at risk the long term data availability. Due to political reasons, Brazil only joined the Global Biodiversity Information Facility (GBIF) in 2012 as an associate (non-contributing financially and non-voting) member with a commitment to become a voting member within five years. Until recently, the Brazilian Biodiversity Information System (SiBBr), Brazil’s GBIF node, was also hindered by politics from having a solid, stable national governance and funding to help “consolidate a solid national infrastructure on biodiversity data”, and to unite the growing Brazilian BI community around it. In the international scenario, while political, cultural and funding reasons may have hindered more equitable collaborations (e.g., tools development and sharing) with countries in the Global South, competing Global North-centric projects have prevailed. Although most remaining biodiversity is in the Global South, where local engagement is crucial, in many cases southern partners still often only act as data providers. Collaborative work is required with genuine co-creation, empowering all parties. Initiatives like the Living Atlases community must be recognized and welcomed as a positive shift. Despite all of these challenges, it can be surprising how much Brazilian biodiversity science has achieved throughout the years, and it gives us hope that in the future, if some of these issues are addressed, a lot more can be done. Education and training, continued funding and institutional support, governance, and international collaboration are essential.
- Research Article
- 10.3897/tdwgproceedings.1.20275
- Aug 14, 2017
- Proceedings of TDWG
Despite the increasing availability of biodiversity data, determing the quality of data and informing would-be data consumers and users remains a significant issue. In order for data users and data owners to perform a satisfactory assessment and management of data fitness for use, they require a Data Quality (DQ) report, which presents a set of relevant DQ measures, validations, and amendments assigned to data. Determining the meaning of "fitness for use" is essential to best manage and assess DQ. To tackle the problem, the TDWG Biodiversity Data Quality (BDQ) - Interest Group (IG) (https://github.com/tdwg/bdq) has proposed a conceptual framework that defines the necessary components to describe Data DQ needs, DQ solutions, and DQ reports (Fig. 1). It supports, in a global and collaborative environment, a consistent description of: (1) the meaning of data fitness for use in specific contexts, using the concept of a DQ profile; (2) DQ solutions, using the concepts of specifications and mechanisms; and (3) the status of quality of data according to a DQ profile, using the concept of a DQ report (Veiga 2016, Veiga et al. 2017). Based on this this conceptual framework, we implemented a prototype of a Fitness for Use Backbone (FFUB) as a Node.js module (https://nodejs.org/api/modules.html) for registering and retrieving instances of the framework concepts. This prototype was built using Node.js, an asynchronous event-driven JavaScript runtime, which uses a non-blocking I/O model that makes it lightweight and efficient to build scalable network applications (https://nodejs.org). We registered our module in the npm package manager (https://www.npmjs.com) in order to facilitate its reuse and we made our source code available in GitHub (https://github.com) in order to foster collaborative development. To test the module, we developed a simple mechanism for measuring, validating and amending the quality of datasets and records, called BDQ-Toolkit, available in the FFUB module. The source code of the FFUB module can be found at https://github.com/BioComp-USP/ffub. Installing and using the module requires Node.js version 6 or higher. Instructions for installing and using the FFUB module can be found at https://www.npmjs.com/package/ffub (Veiga and Saraiva 2017). Using the FFUB module we defined a simple DQ profile describing the meaning of data fitness for use in a specific context by registering a hypothetical use case. Then, we registered a set of valuable information elements for the context of the use case. For measuring the quality of each valuable information elements, we registered a set of DQ dimensions. To validate if the DQ measures are good enough, a set of DQ criteria was defined and registered. Lastly, a set of DQ enhancements for amending the quality in the use case context was also defined and registered. In order to describe the DQ solution used to meet those DQ needs, we registered the BDQ-Toolkit mechanism and all the specifications implemented by it. Using these specifications and mechanism, we generated and assigned to a dataset and its records a set of DQ assertions, according to the DQ dimensions, criteria and enhancements defined in the DQ profile. Based on those assertions we can build DQ reports by composing all the assertions assigned to the dataset or to a specific record. This DQ report describes the status of DQ of a dataset or record according to the context of the DQ profile. This module provides an interface to use the proposed conceptual framework, which allows others to register instances of its concepts. Future work will include creating a RESTful API using sophisticated methods of data retrieval.
- Research Article
- 10.3897/biss.3.35885
- Jun 13, 2019
- Biodiversity Information Science and Standards
In 2017, funding from the Biodiversity Information Fund for Asia accelerated data mobilization and georeferencing by Pakistani herbaria. The funding directly benefited only two herbaria but, by the end of the project 9 herbaria were involved in sharing data, 2 through GBIF (ISL 2019, SINDH 2019; codes according to Index herbariorum) and 6 others (BANNU 2019, BGH 2019, PUP 2019, QUETTA 2019, RAW 2019, SWAT 2019) through OpenHerbarium, a Symbiota based network. Eventually, all collections in OpenHerbarium are expected to become GBIF data providers. Additional Pakistani herbaria are being introduced to data mobilization and several individuals have expressed interest in learning to use OpenHerbarium to generated documented checklists for teaching and research and others for learning to link information in OpenHerbarium to other resources. These are the first steps to developing a “a large group of individuals … to train, mentor, and champion [biodiversity] data use” in Pakistan, but it is important to remember that good bioidiversity data starts in the field. We need to provide today’s collectors and educators with easy access to a) information about what constitutes a high-quality herbarium specimen; b) tools for making it easier to record and provide high quality specimen data; c) simple mechanisms for sharing data in ways that provide immediately useful resources; and d) learning to make use of the data becoming available. OpenHerbarium addresses the third and fourth needs and also makes it simple for collections to become GBIF data providers. This year, the focus will be on first two of the three steps identified. Introduction of the new resources will be used to introduce collectors and educators to the ideas underlying provision of biodiversity data that is fit for use and reuse. When Symbiota2 is functional, OpenHerbarium will be moved to that system. This will encourage development of additional tools for using biodiversity data. All these activities are essential to helping spread understanding of the concepts integral to biodiversity informatics. It is, of course, possible “to train, build, and champion data use” using data for other parts of the world, or provided by institutions from other parts of the world, but embedding good biodiversity data practices into the fabric of a country’s biodiversity education and research activities better benefits the country if a substantial portion of the data is generated from within the country. It also helps to spread knowledge of the country’s biodiversity among its students. Consequently, our focus in developing Pakistan’s capacity in biodiversity informatics is on engaging collections and collectors in sharing biodiversity data, then helping them discover, use, and create methods for developing the insights needed to encourage wise use of the country’s biological resources, and encouraging interaction. This will lead to a “community of practice” within Pakistan that can both benefit from and contribute to an international “community of practice”.
- Research Article
49
- 10.1371/journal.pone.0178731
- Jun 28, 2017
- PLOS ONE
The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user’s standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user’s perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.
- Research Article
3
- 10.5897/ijbc2015.0938
- Jun 30, 2016
- International Journal of Biodiversity and Conservation
Over the past three decades, most sub-Saharan African (SSA) countries have developed national policies, legislations, plans, and institutions that are geared towards biodiversity conservation and management. However, evidently lacking in these instruments is the mechanisms for the generation, processing and sharing of biodiversity information. This study reviews the current biodiversity policy and institutional landscapes, and their impacts on the generation, processing, sharing, and use of biodiversity information for decision-making in SSA. We employed an integrated approach for data collection including literature review, telephone interviews and questionnaire administration. Findings show that biodiversity information has primarily been mobilized in an ad hoc manner through project surveys and academic research endeavours. Currently, majority of SSA countries still do not have standalone biodiversity policies that could prioritize biodiversity information and provide specific mechanisms and structures for the mobilization, processing and sharing of biodiversity information. Rather, efforts have focused on mainstreaming strategies and action plans into related sector policies and planning activities with potential impacts on biodiversity information. This move has not been entirely successful in sustaining efforts on biodiversity data and information generation, utilization and sharing. While the relevance of biodiversity information for national development is acknowledged by stakeholders, there are still major obstacles including: the lack of funding for data mobilization, weak institutional capacity, lack of individual competencies, and inadequate training on techniques for mobilizing biodiversity data and information. Advocating for value-added and demand-driven biodiversity information has the potential to garner policy support and legitimacy to reach the level of importance required for investment, capacity development and specialised institutions for biodiversity conservation in SSA. Key words: Biodiversity, information, policies, institutions, sub-Saharan Africa.
- Research Article
23
- 10.1186/1471-2105-10-s14-s3
- Nov 1, 2009
- BMC Bioinformatics
BackgroundIncreasing the quantity and quality of data is a key goal of biodiversity informatics, leading to increased fitness for use in scientific research and beyond. This goal is impeded by a legacy of geographic locality descriptions associated with biodiversity records that are often heterogeneous and not in a map-ready format. The biodiversity informatics community has developed best practices and tools that provide the means to do retrospective georeferencing (e.g., the BioGeomancer toolkit), a process that converts heterogeneous descriptions into geographic coordinates and a measurement of spatial uncertainty. Even with these methods and tools, data publishers are faced with the immensely time-consuming task of vetting georeferenced localities. Furthermore, it is likely that overlap in georeferencing effort is occurring across data publishers. Solutions are needed that help publishers more effectively georeference their records, verify their quality, and eliminate the duplication of effort across publishers.ResultsWe have developed a tool called BioGeoBIF, which incorporates the high throughput and standardized georeferencing methods of BioGeomancer into a beginning-to-end workflow. Custodians who publish their data to the Global Biodiversity Information Facility (GBIF) can use this system to improve the quantity and quality of their georeferences. BioGeoBIF harvests records directly from the publishers' access points, georeferences the records using the BioGeomancer web-service, and makes results available to data managers for inclusion at the source. Using a web-based, password-protected, group management system for each data publisher, we leave data ownership, management, and vetting responsibilities with the managers and collaborators of each data set. We also minimize the georeferencing task, by combining and storing unique textual localities from all registered data access points, and dynamically linking that information to the password protected record information for each publisher.ConclusionWe have developed one of the first examples of services that can help create higher quality data for publishers mediated through the Global Biodiversity Information Facility and its data portal. This service is one step towards solving many problems of data quality in the growing field of biodiversity informatics. We envision future improvements to our service that include faster results returns and inclusion of more georeferencing engines.
- Research Article
5
- 10.1007/s11024-023-09491-2
- Apr 3, 2023
- Minerva
Biodiversity informatics produces global biodiversity knowledge through the collection and analysis of biodiversity data using informatics techniques. To do so, biodiversity informatics relies upon data accrual, standardization, transferability, openness, and “invisible” infrastructure. What biodiversity informatics mean to society, however, cannot be adequately understood without recognizing what organizes biodiversity data. Using insights from science and technology studies, we story the organizing “visions” behind the growth of biodiversity informatics infrastructures in Sweden—an early adopter of digital technologies and significant contributor to global biodiversity data—through interviews, scientific literature, governmental reports and popular publications. This case story discloses the organizational formation of Swedish biodiversity informatics infrastructures from the 1970s to the present day, illustrating how situated perspectives or “visions” shaped the philosophies, directions and infrastructures of its biodiversity informatics communities. Specifically, visions related to scientific progress and species loss, their institutionalization, and the need to negotiate external interests from governmental organizations led to unequal development across multiple infrastructures that contribute differently to biodiversity knowledge. We argue that such difference highlights that the social and organizational hurdles for combining biodiversity data are just as significant as the technological challenges and that the seemingly inconsequential organizational aspects of its infrastructure shape what biodiversity data can be brought together, modelled and visualised.
- Research Article
1
- 10.3897/biss.6.93884
- Aug 23, 2022
- Biodiversity Information Science and Standards
Biodiversity information and biorepositories in Bhutan were instituted in the 1990s, and have contributed towards biodiversity conservation efforts in the country including citizen-science initiatives. Bioinformatics in Bhutan is still in its initial stage, and some biorepositories have incorporated information systems for their collections. Some major biorepositories in the country include the National Herbarium, National Animal Gene Bank, National Plant Gene Bank, National Invertebrates Repository, and Taxidermy. Other repositories distributed at various institutions cover the taxonomic groups such as fishes, amphibian and reptiles, butterflies and moths, mushrooms, bryophytes, agricultural pests, among others. Biorepositories are one of the major sources of biodiversity data and information, where the National Herbarium and National Invertebrates Repository have initiated digitizing its plant and invertebrate specimens. The digitized specimens will be made freely accessible through the Bhutan Biodiversity Portal and Bhutan Biodiversity Specimen Portal managed by the National Biodiversity Centre (NBC). The Bhutan Biodiversity Portal is an online consortium-based platform for biodiversity documentation of citizen science and collections. Bhutan Biodiversity Specimen Portal is an online system for documenting herbarium specimens especially from the National Herbarium of Bhutan (THIM). Other major information use and sharing systems related to biodiversity include National Biodiversity Clearing House, National Access and Benefit Sharing Clearing House, National Biosafety Clearing House, Forest Information Reporting and Monitoring System, and Pests of Bhutan database. Some of the collection-based offline systems include Royal Botanical Garden Database, National Traditional Knowledge Database, National Plant Gene Bank Information System, National Animal Gene Bank Information System, National Invertebrates Database, and institutional databases. Citizen science initiatives in Bhutan are increasing annually with major participation from youths. The contribution from young people is playing a major role in biodiversity documentation and data use in the country. The most used applications include iNaturalist, eBird, Bhutan Biodiversity Portal, Global Biodiversity Information Facility (GBIF), and social media sites such as Facebook (groups and pages) and blogs. Molecular data and information on species are limited in the country, however, through collaboration with international partners, some molecular information, especially for new species, is available to the public, through the use of Barcode of Life Data System (BOLD). There exists only a handful of laboratories conducting basic molecular studies on biodiversity in the country. The country’s molecular information would be enhanced with the establishment of advanced molecular laboratories and capacities in bioinformatics. Advancements in bioinformatics would also contribute towards enhancing accessibility of specimen collections distributed around the country. Other major challenges include non-availability of experts in some taxonomic groups, especially for understudied groups. It is also imperative to have a coordination mechanism for sharing all data and information related to biodiversity, bringing in all the relevant national and international stakeholders.
- Research Article
- 10.3897/biss.2.26367
- Jun 15, 2018
- Biodiversity Information Science and Standards
Freshwater biodiversity is critically understudied in Rwanda, and to date there has not been an efficient mechanism to integrate freshwater biodiversity information or make it accessible to decision-makers, researchers, private sector or communities, where it is needed for planning, management and the implementation of the National Biodiversity Strategy and Action Plan (NBSAP). A framework to capture and distribute freshwater biodiversity data is crucial to understanding how economic transformation and environmental change is affecting freshwater biodiversity and resulting ecosystem services. To optimize conservation efforts for freshwater ecosystems, detailed information is needed regarding current and historical species distributions and abundances across the landscape. From these data, specific conservation concerns can be identified, analyzed and prioritized. The purpose of this project is to establish and implement a long-term strategy for freshwater biodiversity data mobilization, sharing, processing and reporting in Rwanda. The expected outcome of the project is to support the mandates of the Rwanda Environment Management Authority (REMA), the national agency in charge of environmental monitoring and the implementation of Rwanda’s NBSAP, and the Center of Excellence in Biodiversity and Natural Resources Management (CoEB). The project also aligns with the mission of the Albertine Rift Conservation Society (ARCOS) to enhance sustainable management of natural resources in the Albertine rift region. Specifically, organizational structure, technology platforms, and workflows for the biodiversity data capture and mobilization are enhanced to promote data availability and accessibility to improve Rwanda’s NBSAP and support other decision-making processes. The project is enhancing the capacity of technical staff from relevant government and non-government institutions in biodiversity informatics, strengthening the capacity of CoEB to achieve its mission as the Rwandan national biodiversity knowledge management center. Twelve institutions have been identified as data holders and the digitization of these data using Darwin Core standards is in progress, as well as data cleaning for the data publication through the ARCOS Biodiversity Information System (http://arbmis.arcosnetwork.org/). The release of the first national State of Freshwater Biodiversity Report is the next step. CoEB is a registered publisher to the Global Biodiversity Information Facility (GBIF) and holds an Integrated Publishing Toolkit (IPT) account on the ARCOS portal. This project was developed for the African Biodiversity Challenge, a competition coordinated by the South African National Biodiversity Institute (SANBI) and funded by the JRS Biodiversity Foundation which supports on-going efforts to enhance the biodiversity information management activities of the GBIF Africa network. This project also aligns with SANBI’s Regional Engagement Strategy, and endeavors to strengthen both emerging biodiversity informatics networks and data management capacity on the continent in support of sustainable development.
- Research Article
- 10.3897/biss.3.39215
- Aug 20, 2019
- Biodiversity Information Science and Standards
Most biodiversity research aims at understanding the states and dynamics of biodiversity and ecosystems. To do so, biodiversity research increasingly relies on the use of digital products and services such as raw data archiving systems (e.g. structured databases or data repositories), ready-to-use datasets (e.g. cleaned and harmonized files with normalized measurements or computed trends) as well as associated analytical tools (e.g. model scripts in Github). Several world-wide initiatives facilitate the open access to biodiversity data, such as the Global Biodiversity Information Facility (GBIF) or GenBank, Predicts etc. Although these pave the way towards major advances in biodiversity research, they also typically deliver data products that are sometimes poorly informative as they fail to capture the genuine ecological information they intend to grasp. In other words, access to ready-to-use aggregated data products may sacrifice ecological relevance for data harmonization, resulting in over-simplified, ill-advised standard formats. This is singularly true when the main challenge is to match complementary data (large diversity of measured variables, integration of different levels of life organizations etc.) collected with different requirements and scattered in multiple databases. Improving access to raw data, and meaningful detailed metadata and analytical tools associated with standardized workflows is critical to maintain and maximize the generic relevance of ecological data. Consequently, advancing the design of digital products and services is essential for interoperability while also enhancing reproducibility and transparency in biodiversity research. To go further, a minimal common framework organizing biodiversity observation and data organization is needed. In this regard, the Essential Biodiversity Variable (EBV) concept might be a powerful way to boost progress toward this goal as well as to connect research communities worldwide. As a national Biodiversity Observation Network (BON) node, the French BON is currently embodied by a national research e-infrastructure called "Pôle national de données de biodiversité" (PNDB, formerly ECOSCOPE), aimed at simultaneously empowering the quality of scientific activities and promoting networking within the scientific community at a national level. Through the PNDB, the French BON is working on developing biodiversity data workflows oriented toward end services and products, both from and for a research perspective. More precisely, the two pillars of the PNDB are a metadata portal and a workflow-oriented web platform dedicated to the access of biodiversity data and associated analytical tools (Galaxy-E). After four years of experience, we are now going deeper into metadata specification, dataset descriptions and data structuring through the extensive use of Ecological Metadata Language (EML) as a pivot format. Moreover, we evaluate the relevance of existing tools such as Metacat/Morpho and DEIMS-SDR (Dynamic Ecological Information Management System - Site and dataset registry) in order to ensure a link with other initiatives like Environmental Data Initiative, DataOne and Long-Term Ecological Research related observation networks. Regarding data analysis, an open-source Galaxy-E platform was launched in 2017 as part of a project targeting the design of a citizen science observation system in France (“65 Millions d'observateurs”). Here, we propose to showcase ongoing French activities towards global challenges related to biodiversity information and knowledge dissemination. We particularly emphasize our focus on embracing the FAIR (findable, accessible, interoperable and reusable) data principles Wilkinson et al. 2016 across the development of the French BON e-infrastructure and the promising links we anticipate for operationalizing EBVs. Using accessible and transparent analytical tools, we present the first online platform allowing the performance of advanced yet user-friendly analyses of biodiversity data in a reproducible and shareable way using data from various data sources, such as GBIF, Atlas of Living Australia (ALA), eBIRD, iNaturalist and environmental data such as climate data.
- Research Article
1
- 10.3897/biss.7.110675
- Aug 7, 2023
- Biodiversity Information Science and Standards
Biodiversity data are of crucial importance in the implementation of sustainable development strategies (UN Sustainable Development Goals, EU Taxonomy for Sustainable Activities), because they contribute to deepening our understanding of marine and terrestrial ecosystems and our assessment of the impact of human activities on the planet. Yet, biodiversity informatics itself is lacking an overall sustainability plan that addresses “the three E’s of environment, equity, and economics” (Brinkmann 2021) in relation to biodiversity data, their lifecycle, and related technologies. This is especially problematic given that data-driven biodiversity research has had a ten-fold increase (GBIF Secretariat 2022). This implies higher costs for data management, storage and transmission, a larger carbon footprint, and open issues about data access and reuse. SMART (Specific, Measurable, Assignable, Realistic, Time-Related) goals can help to set a sustainability strategy for biodiversity informatics because they bring together the ambitions of an organisation with the action plan necessary to realise these ambitions. When G.T. Doran proposed the use of SMART goals (Doran 1981), his aim was not to create a checklist, but a tool to get results and produce deliverables, which is precisely what is required for implementing an effective sustainability strategy. Three examples are provided here to explain the value of a SMART approach to set sustainability goals in biodiversity informatics. Goal 1, Increase energy efficiency of computing hardware: A large part of the energy consumption in data centres is due to hardware operations. Several strategies can be pursued to increase energy efficiency in this area, such as activity optimisation for Graphic Processing Units (GPUs), improvements in the effectiveness of cooling systems, and replacement of Central Processing Units (CPUs) with more efficient GPUs. This sustainability goal is specific because it clearly identifies what needs to be achieved; it is measurable because pre- and post- energy consumption are known and can be easily compared with the expected improvement; it is assignable because data centre managers can be identified as the agents of change; it is realistic because there are multiple technological solutions to achieve the goal; and time-related because a roadmap for change stretching through months or years can be set up. Achieving this goal is evidently beneficial for both the budget and the carbon footprint of biodiversity informatics, as higher energy efficiency is associated to lower running costs and also contributes to reduce greenhouse gases emissions. Goal 2, Reducing data storage costs per terabyte: Data storage is becoming one of the most significant operating expenses in data-driven organisations due to the considerable increase in the amount of data produced. It is therefore imperative to reduce costs by selecting appropriate storage options (on-premises or cloud services), by adopting compressed formats whenever possible, and by avoiding data redundancy. This goal is SMART because it sets a target (the expected cost reduction can be quantified) and establishes measures (cost per terabyte) to assess its achievement. It can be assigned to data managers, who can choose among different technologies and service providers to lower costs, and can be scheduled over time according to budget needs. The achievement of this goal has clear economic benefits, but also increases the social sustainability of biodiversity informatics, because it makes data-driven research affordable even for organisations with limited financial means. Goal 3, Decreasing API (Application Programming Interface) response time and bandwidth for data transmission: Data sharing is a key activity in biodiversity informatics and APIs have become the standard technology for achieving it. Improving API performance is therefore crucial to enhance the user experience for stakeholders interested in biodiversity data. In particular, API response time and bandwidth required for data transmission control whether users will be able to access immediately and download satisfactorily the data they are interested in. Goal 3 is SMART because API response time and bandwidth are known measures and can be improved using strategies such as caching and concurrency to handle data requests more rapidly and delivering data in compressed formats to reduce bandwidth. Software developers and project managers involved in API design and maintenance are the people in charge of this goal. They decide the best strategy and timeline to pursue changes. Improving API performance increases the social sustainability of biodiversity informatics because it promotes uncomplicated data access for all stakeholders interested in biodiversity data. It also lowers energy consumption for downloading data benefitting the environment.
- Research Article
- 10.3897/biss.8.132930
- Aug 7, 2024
- Biodiversity Information Science and Standards
In recent years, the word sustainability has been used in multiple domains ranging from finance to packaging, and the list of products and activities claiming to be sustainable grows day after day. However, a general and unambiguous definition of the term sustainability is not available and might not even be advisable because sustainability is a multifaceted concept that takes its meaning from the context of use and cannot provide a set of rules applicable everywhere (Ramsey 2015). Therefore, it remains a task for all the fields that strive to be sustainable to carve out a working definition that suits their needs. Addressing the meaning of sustainability in biodiversity informatics requires paying special attention to the data practices and infrastructures of this discipline. Over time, there has been a steep increase in the amount of biodiversity data created and shared and, at the same time, a rapid proliferation of use cases for biodiversity data. For instance, fields as different as spacial planning (Underwood et al. 2018), climate change research (Hirsch 2017), and finance (Elliot et al. 2024) rely now on biodiversity data. This growth can become unmanageable for biodiversity science, if its data practices and infrastructures do not scale up efficiently and are not sufficiently flexible to cater to the needs of a larger and more varied group of stakeholders. Biodiversity informatics has mainly addressed the financial sustainability of its infrastructures so far (Cook and Cochrane 2023; Kalfatovic et al. 2023). However, environmental and social sustainability are equally important to ensure that biodiversity science achieves its full potential. For instance, data infrastructures with lower environmental impact typically require less electricity to run continuously, and this will save money in the long run, especially at times of significant fluctuations in energy prices. Again, if biodiversity data are collected and shared while keeping in mind the necessities of multiple stakeholders, the user community will become larger and this, in turn, will offer the chance to develop fairer cost contribution agreements for running data infrastructures. More examples could be added to make the case that the three pillars of sustainability—economy, society, environment (Purvis et al. 2018)—are all equally relevant when examining data practices and infrastructures in biodiversity science. Hence, there is a need to open up a discussion on sustainability within the biodiversity community, and to promote an exchange on the topic among different stakeholders interested in biodiversity data and working with them. A community effort is required to agree upon an effective definition of sustainability in biodiversity informatics. Only such a community effort can translate this definition into practical criteria and recommendations for the sustainability of biodiversity data and infrastructures, and set milestones for their implementation. The Society for the Preservation of Natural History Collections and Biodiversity Information Standards (TDWG) are authoritative members of the biodiversity community. The aim of the talk is to involve their members in the discussion on data practices and infrastructures, addressing all the three pillars of sustainability: economy, society, and environment.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.