Articles published on Global Biodiversity Information Facility
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1192 Search results
Sort by Recency
- New
- Research Article
- 10.37425/mgwkfj71
- Jan 16, 2026
- East African Journal of Science, Technology and Innovation
- Geofrey Sikazwe + 4 more
Cassava (Manihot esculenta) is among the most important staple crops globally. In sub-Saharan Africa, it is cultivated mainly by subsistence farmers who depend directly on it for their socio-economic welfare. However, its yield in some regions has been threatened by several diseases, especially the Cassava brown streak disease (CBSD). Changes in climatic conditions enhance the risk of the disease spreading to other planting regions. This work aimed to identify, characterize and map the current and potential future suitable habitats for cassava and cassava brown streak disease in Africa. We obtained occurrence data for cassava in Africa from the Global Biodiversity Information Facility (GBIF), and cassava brown streak disease occurrences from published literature. We used an ensemble of four species distribution models (SDMs), together with environmental covariates to characterize the current and future distribution of cassava and CBSD in Africa. Our results identified isothermality (Bio03, relative importance: 31.6%) as the highest contributor to the current distribution of cassava, while cassava cassava-harvested area (CHA, 14.6%) contributed the most to the current distribution of CBSD outbreaks. The geographic distributions of these target species are also expected to shift under climate projection scenarios. Using the recent climate scenarios from the Coupled Model Intercomparison Project (CMIP6) for the mid-term (2041-2060) and long-term (2061-2080) in Africa, our study highlights suitable habitats for cassava, as well as one of its economically important diseases (CBSD). For cassava, about 54.6% (16.2 million km2) of the continent is currently suitable for cassava production. These suitable habitats (i.e., suitability above 0.2) were predicted to be located predominantly in Sub-Saharan Africa. On the other hand, approximately 33.7% of Africa's land area (10.2 million km2) is currently at risk of CBSD spread. Based on our findings we propose that improved cassava varieties tolerant to CBSD should be deployed in all cassava production regions
- New
- Research Article
- 10.1111/1440-1703.70032
- Jan 1, 2026
- Ecological Research
- Rion Shimauchi + 2 more
ABSTRACT Satoyama is a traditional heterogeneous Japanese landscape complex including semi‐natural ecosystems such as farmland, grassland, and secondary forests. In Satoyama, abandonment of farmland is increasing, contributing to the mosaic structure of these landscapes by introducing new components. However, previous research on Satoyama has largely overlooked this perspective. Spiders serve as valuable biodiversity indicators in mosaic landscapes and provide ecosystem services, such as pest control. This study presents spider collection data as an indicator of abandoned farmland's impact on surrounding biodiversity, considering it a factor in enhancing landscape mosaic complexity. Specimens were collected from different Satoyama landscape components, including dry fields, paddy fields, abandoned dry fields, and abandoned paddy fields, in Machida City, Tokyo, over four periods during June–October 2023. The resulting dataset included 629 individuals representing 94 taxa identified at the following levels: 55 species, 31 genera, 1 subfamily, and 7 families. All data were deposited in the Global Biodiversity Information Facility (GBIF) through its Japan Node and are accessible via the GBIF portal under the Creative Commons Attribution 4.0 International license ( https://www.gbif.org/dataset/259d6519‐f92d‐4291‐adb0‐04a925be0d33 ). The detailed Metadata for this abstract is available in MetaCat in JaLTER at https://doi.org/10.20783/DIAS.JLE.92 .
- New
- Research Article
- 10.1007/s10584-025-04103-2
- Dec 31, 2025
- Climatic Change
- Chiara Vanalli + 3 more
Abstract Climate change is strongly impacting agriculture, reducing crop production and shifting the geographic distribution of suitable areas for crop cultivation. To safeguard future global yield and feed a growing world population, the migration of crop production areas to new suitable sites represents a way to adapt to a changing climate. Here, we aim to identify the ecological niche of agricultural Prunus species, namely peach, plum, almond, apricot and sweet cherry, and examine their expected future shifts under climate change. For each of the five species, we selected from the literature processed-based phenological models for dormancy break, blooming and fruit ripening, whose fulfillment determines whether an area is suitable for crop cultivation. We simulated the current pheno-suitability across Europe and validated the estimated niches with occurrence data from the Global Biodiversity Information Facility. We then implemented the phenological models to predict potential shifts in the suitablity niches under future climate change scenarios. Historically, the ecological niche of Prunus species spans mid-low European latitudes, while higher latitudes fail to satify the forcing requirements for blooming and fruit ripening. In the future, this constraint is expected to become less restrictive with a northwards expansion of the suitable areas. However, this will be contrasted by a contraction of the niche at low latitudes due to dormancy break failures. While bridging established mechanistic knowledge on the climatic effects of plant phenological traits with citizen science observations, our work brings new insights into how fruit crops will respond to global warming.
- New
- Research Article
- 10.3897/bdj.13.e171929
- Dec 30, 2025
- Biodiversity Data Journal
- Dzmitry Lukashanets + 15 more
BackgroundLimno-terrestrial rotifers, particularly those of the order Bdelloidea, inhabit bryophytes, lichens, soils and other periodically moist terrestrial habitats. Despite their high abundance in all latitudinal zones, these rotifers remain poorly documented in biodiversity databases due to difficulties in preservation and morphological identification. As a result, the current knowledge of their global distribution is still highly fragmented and geographically biased, with the majority of species records concentrated in Europe (where the experts mostly collected material). Many regions like north-eastern Asia, North and South America, Africa and Australia remain under-represented in current knowledge on distribution of limno-terrestrial rotifers. Recent studies suggested that Bdelloidea exhibit distinct biogeographical patterns and potentially high levels of cryptic diversity and endemism, challenging the traditional view of the omnipresence of all microscopic taxa. Comprehensive, georeferenced occurrence data are essential to advance our understanding of bdelloid biodiversity and distribution, yet such data are still scarce in global platforms like the Global Biodiversity Information Facility (GBIF).New informationThe dataset provides new, georeferenced data on the occurrence of rotifer species inhabiting limno-terrestrial habitats worldwide. For the first time, occurrence records for 48 rotifer species (including nominal taxonomic ‘subspecies’) are published in GBIF.In particular, the dataset significantly expands the information on the distribution ranges of the bdelloid rotifers (order Bdelloidea) in GBIF. We contributed 5,651 new occurrence records of bdelloids rotifers, increasing the number of records in GBIF by 47.5% (or by 61.6% when considering only previously georeferenced records).Moreover, we add 394 new records to the faunas of 49 of 56 studied countries, in 17 of which limno-terrestrial rotifers were studied for the first time.Additionally, for 19 countries, records of bdelloid rotifers are now available in GBIF for the first time.
- Research Article
- 10.3897/biss.9.183197
- Dec 24, 2025
- Biodiversity Information Science and Standards
- Hanieh Saeedi
Introduction: Oceans cover over 70% of Earth’s surface (Cael et al. 2023). An estimated 2.2 million marine species exist, yet nearly 80% remain undescribed (Mora et al. 2011). Around 370,000 species are accepted in the World Register of Marine Species (WoRMS), but open-access occurrence data exists for only about 200,000 in the Ocean Biodiversity Information System (OBIS). Documenting marine biodiversity is vital for making evidence-based policy and management decisions in order to maintain ecosystem stability and planetary health. Initiatives such as the Census of Marine Life and open-access databases like OBIS continue to transform understanding and support the UN Decade of Ocean Science. Through data sharing and global collaboration, we can better estimate and conserve marine biodiversity by first identifying data and knowledge gaps. Methods: In two underrepresented areas, the South-West Pacific (SWP) and the Indian Ocean (IO), the current biodiversity patterns of fauna were mapped to identify the knowledge gaps and distribution patterns by depth zones. All occurrence data (Animalia) were extracted from the OBIS and the Global Biodiversity Information Facility (GBIF). The occurrence records were quality-controlled in accordance with the OBIS data quality guidelines (OBIS 2025). Only accepted marine taxa were retained after cross-referencing species names with the WoRMS. In total, 5,441,962 occurrence records in the SWP and 7,768,826 occurrence records in the IO were used in this study (Suppl. material 1). Results: The number of occurrence records decreased with depth in all taxa in both oceans. More than 60% of the occurrence records with available depth information were from shallow waters (0–200 m), highlighting significant knowledge gaps in deep-sea biodiversity. Still, more than 11 million km² of the SWP and IO had fewer than 50 occurrence records after data cleaning in the shallow waters. Based on 5-degree latitudinal bands, the higher latitudes of the SWP (0–25°S) were less sampled or the data were not reported, compared to the lower latitudes. Mid-latitudes of the SWP (30–45°S, eastern and western Australia) had the greatest distribution records, mostly related to Chordata, followed by Arthropoda and Mollusca. However, 10–25°S latitudes of the SWP had the highest number of reported species, mostly associated with Chordata, followed by Mollusca and Arthropoda. The mid-latitudes of the IO (5–30°S) were less sampled, or the data were not reported, compared to the upper and lower latitudes. Chordata occurrence records were the exception, with a peak at 10–25°S latitudes showing the highest distribution records, followed by Arthropoda. Also, 10–20°S latitudes of the IO had the greatest number of species, mostly related to Chordata, followed by Arthropoda and Mollusca (Suppl. material 2) and (Suppl. material 3). Application: The generated knowledge is crucial for strengthening biodiversity monitoring and ensuring rapid, accessible information for policymakers through science-policy interfaces such as the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES), thereby supporting the development of urgent conservation strategies for underrepresented and threatened marine ecosystems, such as the SWP and the IO, before it is too late.
- Research Article
- 10.3897/biss.9.183376
- Dec 24, 2025
- Biodiversity Information Science and Standards
- Laurence Benichou + 1 more
For over a decade, a lot of effort has been put into making millions of pages and taxonomic treatments available online. Yet, too often, biodiversity knowledge remains locked in hardly accessible and non-machine-readable literature and electronic resources. Indeed, most prospective publications are still produced in an “old-fashioned” way (PDF), thereby posing a risk of following a pattern we have seen with legacy literature from the past centuries. Even though access barriers can be demolished using retroconversion (i.e., converting a PDF into an XML annotated file once the article is published), it is far more efficient to have author- and editor-vetted annotations and semantic enhancements to the texts and data ready ahead of publication. Doing so avoids discrepancies, and ensures earlier and rapid dissemination and re-use of data. Beyond open access, the challenge for publishers is to reach interoperability, which would ultimately shorten the too-long time to find and analyse past literature (Fontaine et al. 2012) and therefore speed up the description of taxonomy. Making data within publications findable, accessible, interoperable, and reusable (FAIR) implies text structuration, semantic annotations*1, and standardization. Furthermore, to link data, information and knowledge contained in literature and other electronic resources to uniquely identifiable components (i.e., images, tables, references, taxonomic treatments; see Fig. 1) means that those would be re-usable in research and policy covering biodiversity and other domains. Ultimately, this enables immediate re-use of the data, and the integration of publications into a comprehensive global biodiversity knowledge graph. Following the vision of the Disentis roadmap*2, this presentation will outline and demonstrate the value of advanced methods for scholarly publishing, production and FAIRisation of biodiversity data during the journal production process. We described the latest developments in the scientific publishing industry and—more specifically—how to publish linked and semantically enhanced research outcomes. The presentation clarifies the concepts that are relevant within the publication for semantic enhancement. Highly-automated, XML-first, AI-assisted existing workflows for semantic enhancement of articles are introduced, including the related standards, all dedicated to making published data FAIR by creating bi-directional linked data from and to the published article. Semantization goes beyond metadata and all relevant information for biodiversity are isolated, annotated, and linked with persistent identifiers (Chester et al. 2019, Agosti et al. 2022) including: taxonomic treatment, taxon names, material examined, and material citation, the latter including country, locality, coordinates, collection date, collectors, and specimen code. Once annotated, the XML-based publication can be disseminated through a partnership with Plazi and all components pushed to relevant databases, e.g., Ocellus, Global Biodiversity Information Facility (GBIF), Biodiversity PMC (PubMed Central), ChecklistBank, National Center for Biotechnology Information (NCBI). The potential reuse of data is multiple and will benefit the publishers as well as the research community. Indeed, this wide dissemination of data contained within their articles enables publishers to create dashboards to measure e.g., numbers of articles published, numbers for data described, the richness of the data, the number of occurrences by continent. The benefits are multiple: not only do XML-first journal production workflows save time, effort and provide FAIR-born data on the day of publication, they optimize and accelerate the dissemination of data while providing human- and machine-readable publications. Semantically structured FAIR data pave the way for training and use of AI in publishing.
- Research Article
- 10.3897/biss.9.183270
- Dec 23, 2025
- Biodiversity Information Science and Standards
- Sachit Rajbhandari + 9 more
Environmental DNA (eDNA) is increasingly used to monitor biodiversity, biosecurity and invasive species, providing insights into species presence across ecosystems. As eDNA datasets grow, interoperability and accessibility are crucial. OBIS Australia (OBIS-AU), Australia’s node of the United Nations Educational, Scientific and Cultural Organization (UNESCO) International Oceanographic Data and Information Exchange (IODE) Ocean Biodiversity Information System (OBIS), hosted by the Commonwealth Scientific and Industrial Research Organisation (CSIRO) National Collections and Marine Infrastructure (NCMI), promotes use of the DNA Derived Data Extension in Darwin Core (DwC) to standardise publication of eDNA and metabarcoding data Wieczorek et al. (2012), Abarenkov et al. (2023). OBIS-AU has published over 26 datasets with 21 million eDNA records to the OBIS. OBIS-AU is developing a scalable and interoperable eDNA data publishing pipeline that integrates tools such as the Findable, Accessible, Interoperable, Reusable eDNA (FAIRe) suite and the Global Biodiversity Information Facility (GBIF) Metabarcoding Data Toolkit (MDT) to transform diverse eDNA source data into Darwin Core Archives (DwC-A) for publication to OBIS, GBIF, and Atlas of Living Australia (ALA). By leveraging metadata standards including DwC, DNA Derived Data Extension, Minimum Information about any (X) Sequence (MIxS) and the FAIRe metadata checklist, the pipeline enables standardised, FAIR-compliant data publishing Takahashi et al. (2025), Meyer et al. (2023). It supports multiple transformation pathways, ensuring that eDNA datasets are consistent, reusable, and aligned with global biodiversity data infrastructures. The FAIR eDNA initiative enhances the FAIRness of eDNA data by extending standards like DwC and MIxS with eDNA-tailored metadata terms Takahashi et al. (2025). The FAIRe tools (FAIRe-ator, FAIRe-fier, and FAIRe2MDT) facilitate creation, validation, and conversion of standardised metadata to improve interoperability and reusability across platforms. The Metabarcoding Data Toolkit (MDT) is an open-source web tool that streamlines publishing of eDNA metabarcoding data by converting common data structures [e.g., Operational Taxonomic Unit (OTU) tables, taxonomy, metadata, Format for All Sequences from All Species (FASTA) files] into DwC-A GBIF Secretariat (2024). This modular design allows the pipeline to accommodate diverse data types and processing workflows while ensuring compatibility with global biodiversity data standards. The pipeline as shown in Fig. 1 provides four main pathways for converting source data into DwC-A and publishing them via the Integrated Publishing Toolkit (IPT): Directly publishing data already formatted as DwC-A; Transforming source data using a simple DwC pipeline with custom DwC transformation script; Generating DwC-A file from source data via the MDT tool; and Using FAIRe tools and converting it to DwC-A via the MDT tool or a custom transformation script. Directly publishing data already formatted as DwC-A; Transforming source data using a simple DwC pipeline with custom DwC transformation script; Generating DwC-A file from source data via the MDT tool; and Using FAIRe tools and converting it to DwC-A via the MDT tool or a custom transformation script. The Australian Microbiome (AM) Initiative is a national collaborative research program characterising microbial diversity across Australia’s terrestrial, freshwater, coastal, and marine environments. The AM data pipeline depicted in Fig. 2 transforms data stored in the AM Data Portal using the MDT tool to generate DwC-A for publication to global repositories. The Globalising Marine Biodiversity Observations (GLOMBO) Partnership is a collaboration between CSIRO and the Minderoo Foundation aimed at improving the large-scale monitoring of Australia’s vast marine ecosystems by deploying automated eDNA sampling systems to gather samples continuously during voyages, with the first installation taking place on CSIRO’s research vessel Investigator . This approach will be expanded through a network of “ships of opportunity,” encompassing research, commercial, and tourist vessels that contribute to a nationwide eDNA monitoring effort. The scalable data pipeline is proposed to automate, integrate, and disseminate eDNA datasets, enabling comprehensive, real-time insights into marine biodiversity across Australia’s oceans as illustrated in Fig. 3. Recording eDNA-derived species occurrences presents several challenges. One example is taxonomic ambiguity, often caused by incomplete reference databases like GenBank, World Register of Marine Species (WoRMS), Barcode of Life Data (BOLD), or SILVA. Linking eDNA sequence reads to biodiversity occurrence records is complex and requires expert knowledge and infrastructure integrating sequence data, metadata, and taxonomy. Technical barriers, limited engagement, and lack of incentives to make data accessible, hinder open access to eDNA data. OBIS-AU is addressing these challenges by exploring tools like GBIF’s MDT, OBIS’s Pacific Islands Marine bioinvasions Alert Network (PacMAN) pipeline, FAIRe tools, and AI-based tools as well as expert support and developing new automation pipelines to assist data publishers. OBIS-AU has published eDNA data using DwC Occurrence Core and DNA Derived Data Extension and is now testing a publication model with the DwC Event Core to better capture sampling context and improve integration, interoperability, and reuse of complex eDNA datasets. OBIS-AU intends to align with the new DwC Data Package to support modular publishing of marine biodiversity data.
- Research Article
- 10.3897/biss.9.183023
- Dec 23, 2025
- Biodiversity Information Science and Standards
- Susanna Ioni + 4 more
European habitats are classified under a framework developed by the European Topic Centre for Biodiversity for the European Environment Agency, as part of the European Nature Information System (EUNIS) (Davies et al. 2004). All terrestrial, freshwater, and marine habitats follow a hierarchical classification based on physical features, human influence, and dominant vegetation (Moss 2008, Chytrý et al. 2020). Distribution maps are provided and modelled using occurrence data of indicator species collected from vegetation surveys (Hennekens 2017). Although the system may seem accurate, when we first plotted the distribution of the main species of our habitat study case, EUNIS Habitat S22 ‘Alpine and subalpine ericoid heath’ (European Environment Agency 2019), we observed that occurrence data, e.g., from sources like the Global Biodiversity Information Facility (GBIF), often fell outside the mapped areas of the habitat. Furthermore, important occurrence data sources, such as herbaria, were left out of the official distribution mapping, representing, in our view, a significant shortcoming of the EUNIS system. This study addresses these gaps by integrating diverse sources of in situ occurrence data (herbaria, vegetation surveys, citizen science) through a machine learning approach to complement the current EUNIS mapping. Specifically, we modelled the distributions of diagnostic species of the Habitat S22, using species distribution models (SDMs). For this purpose, we retrieved occurrence data from GBIF, identified by the accepted names as well as taxonomic synonyms, using the R package rgbif (Chamberlain et al. 2025), and utilised the Darwin Core (Wieczorek et al. 2012) standard. Data were filtered to include European occurrences with spatial coordinates and uncertainty of <500 m, and only spring and summer months of 1980–2024. For modelling itself, they were stratified into a 1-km grid. As SDM predictors, we used proxies for macroclimate and topography. Climatic predictors included CHELSA Bioclim variables of mean annual temperature, temperature seasonality, annual precipitation, precipitation seasonality, and an aridity index (Zomer et al. 2022). For topography, we used the digital terrain model, Copernicus, and calculated slope and indices for heat load (McCune and Keon 2002), topographical ruggedness (Riley et al. 1999), and topographical wetness (Beven and Kirkby 1979), using the spatialEco R package (Evans and Murphy 2021) and SAGA GIS (Conrad et al. 2015). Data were integrated into data cubes, and correlations among species occurrences and predictors were tested. We supplemented the occurrence data with pseudo-absences sampled within a buffer around presence points (Fallgatter et al. 2025). We fitted ensemble SDMs weighted by true-skill statistics scores based on independent cross-validation. We modelled two spatial resolutions in two regions: continental Europe at 1-km resolution, and the European Alps at 100-m resolution. continental Europe at 1-km resolution, and the European Alps at 100-m resolution. Predicted species distributions were aggregated into cumulative distribution maps. Those were further validated by overlapping them with the distribution of the habitat based on vegetation plots classified by an expert system as provided by the European Vegetation Archive (EVA) plots at 1-km resolution. Predictions were also compared with the official EUNIS probability map for Habitat S22. Correlation analyses confirmed the ecological features of the Habitat S22 indicated by the EUNIS classification. Our modelled ranges largely overlapped with the distribution of EVA plots and the EUNIS probability map, but also revealed mismatches at lower elevations and in the Scandinavian region. These differences decreased when fewer species were combined in cumulative predictions. Our findings show that SDMs based on occurrence data from different sources can validate and refine expert-defined habitat maps, offering a complementary and data-driven approach.
- Research Article
- 10.3897/biss.9.183132
- Dec 23, 2025
- Biodiversity Information Science and Standards
- Majid Vafadar
The Museum für Naturkunde Berlin (MfN) DataHub*1 is an open-source web service and workflow engine developed to execute automated data-integration and migration workflows in continuous and parallel scenarios. Data migration and integration remain major challenges in publishing biodiversity data that follow international standards. To overcome these, a centralized service was created to coordinate and concentrate the computational power required for large-scale data transformation. Deployed at the Museum für Naturkunde Berlin, it now serves as the core of the institution’s scientific data-management infrastructure. This service supports digitization pipelines by iterating through datasets and files to integrate them into designated target systems. Following the ETL (Extract, Transform, Load) (Moreau 2015) principle, it extracts data from internal databases and shared storages via secure protocols such as SMB*2 and SFTP*3, transforms and validates them, and loads the compliant outputs through target-system API endpoints. The overall architecture and data flow are shown in Fig. 1. Operations are controlled through a web dashboard that allows execution and monitoring of pipelines either manually, automatically, or with AI-agent assistance. Implemented using the Django Web Framework, the service runs modular Python scripts and exposes all functions through RESTful APIs. A MCP*4 server provides an AI-readable interface, enabling both human- and machine-driven operations. Connected to the museum’s storage systems, the DataHub validates, enriches, and transforms data with a dedicated validator ensuring each record meets predefined structural and semantic rules. A persistent integration pipeline imports datasets into the museum’s Specify collection management system and its digital catalog, making data accessible for research and public use. It also prepares standardized packages for external partners such as GBIF (Global Biodiversity Information Facility), following the Darwin Core (Wieczorek et al. 2012) format and specific project requirements. Integration with field-data applications like ODK (Open Data Kit) ensures mobile data collection can enter the same pipeline. All operational steps, including validation, transformation, and API transactions are fully logged for transparency and reproducibility of the operation. A key innovation is the AI-integration layer, linking through the MCP*4 server to an AI agent built with LangChain library and Qwen3 LLM*6 model, executed locally via Ollama platform. This component assists with workflow orchestration by optimizing tasks order, resource allocation, error recovery and live reports thereby reducing manual supervision. By combining ETL*7 pipelines with AI-assisted orchestration, the DataHub provides a flexible, scalable engine adaptable to different collection domains. Its modular and open-source design promotes reproducibility and extension to new systems. The centralized architecture enhances data quality and FAIR (Findability, Accessibility, Interoperability, and Reusability) Wilkinson et al. 2016 compliance, offering a practical and scalable solution that can be deployed in natural-history institutions aiming to modernize their digital-collection infrastructures and ensure the continuous availability of reliable, high-quality biodiversity data. The open-source code and technical documentation of the MfN DataHub are available on the museum's GitHub repository.*1
- Research Article
- 10.1080/00779962.2025.2602985
- Dec 23, 2025
- New Zealand Entomologist
- Madeleine Mccullough + 1 more
ABSTRACT Both citizen science observations and taxonomic collections provide valuable data on species occurrences, typically including species identification, as well as the time and place of observation or collection. Comparing these datasets is intuitive and important for understanding biodiversity patterns. In this study, over 86,000 records of exotic insect species in New Zealand were obtained from the Global Biodiversity Information Facility, comprising two record types: citizen science observations (from iNaturalist) and specimen records (from digitised museum collections). These datasets were compared across taxonomic levels, temporal and geographic scales, and species body size. Key differences emerged between the two data sources. Although the total number of exotic insect records was similar, exotic species accounted for a greater proportion of citizen science observations (1 in 5) than specimen records (1 in 10). Taxonomic composition varied significantly between the datasets at the order, family, and species levels, with citizen science observations disproportionately representing larger-bodied species. Consequently, many exotic insect species present in New Zealand were underrepresented in observation records. Despite these biases, the large volume of citizen science data makes it a valuable resource for biosecurity and invasion biology. Enhancing public awareness of diverse insect groups, including smaller or less conspicuous species, could improve data coverage. Ultimately, leveraging the complementary strengths of both record types will enhance biodiversity monitoring and biosecurity efforts.
- Research Article
- 10.1002/ppp3.70149
- Dec 23, 2025
- PLANTS, PEOPLE, PLANET
- Jed Arno + 10 more
Societal Impact Statement Understanding and protecting plant life is essential for tackling the twin challenges of biodiversity loss and climate change. To support this, we have developed a new digital approach that helps identify plant species more quickly and accurately. By using images of preserved plant specimens from global collections sourced through the Global Biodiversity Information Facility and combining computer vision technology with expert knowledge from plant scientists, our approach makes it easier to catalogue and study plants. This innovation not only speeds up scientific research but also strengthens the connection between traditional physical plant collections and modern digital collections and tools—helping scientists, conservationists and communities work together to safeguard nature. Summary Computer vision applied to digital herbarium collections holds tremendous promise to streamline specimen identification and accelerate the work of taxonomists and herbarium curators. We present a sampling and image preprocessing pipeline applicable to any image dataset that uses the Darwin Core data standard. We tested it on Cyperaceae, a large monocot plant family known for its identification challenges, and on Rhamnaceae, a eudicot plant family, to demonstrate broad applicability across angiosperms. Digitised herbarium specimens were sampled via the Global Biodiversity Information Facility to create image datasets with balanced representation annotated with taxon labels. These were used to train deep learning models at genus level in Cyperaceae and Rhamnaceae, and at species level in the genera Bulbostylis and Ziziphus . A model fine‐tuned on the data performed efficiently and consistently achieved top‐1, top‐3 and top‐5 accuracy rates of ≥72%, ≥88% and ≥92% in identifying digitised herbarium specimens of Cyperaceae and Rhamnaceae to genus level. Species‐level identification in Bulbostylis reached 65%, 83% and 89%, while Ziziphus achieved higher rates of 72%, 85% and 90%. Our approach integrates an automated pipeline for dataset generation with expert verification to enhance data quality. This framework supports scalable, accurate identification of herbarium specimens and fosters a more dynamic relationship between digital and physical collections.
- Research Article
- 10.3897/biss.9.182514
- Dec 19, 2025
- Biodiversity Information Science and Standards
- Salza Palpurina + 1 more
Vegetation plots (relevés)—records of all plant species with their abundance in fixed-size plots in terrestrial vegetation—might be independent, arranged in nested designs, or established as permanent plots for resurvey. Given their complexity, representing vegetation plot data in global infrastructures such as the Global Biodiversity Information Facility (GBIF) requires thoughtful use of Darwin Core (DwC) and extensions (Darwin Core Task Group 2009). In DwC, a relevé corresponds to a single dwc:Event, hence, the recommended way to publish such data in GBIF is using the Event Core with at least two additional extensions: Occurrence and Relevé (GBIF 2018). The GBIF Relevé extension provides standard measurements across vegetation layers, with explicitly defined units and precision (e.g., “tree cover (%)”). Other plot-level measurements (e.g., soil pH) can be described via the DwC Measurement or Facts extension. The recently ratified Humboldt extension (TDWG Humboldt Extension Task Group 2024) expands the DwC Event class to capture the sampling context of more complex plot designs by extending the descriptions of sampling events, making it especially relevant for nested and repeated sampling. However, recommendations are still evolving (Ingenloff et al. 2025) and it has not been extensively tested with vegetation plot data (Suppl. material 2). Out of the 374 databases listed in the Global Index of Vegetation‐Plot Databases (GIVD; accessed on 15.09.2025), only a few are currently published via GBIF (e.g., Hennekens 2018, Kuzemko et al. 2024, Swacha et al. 2025), and mostly as Occurrence class datasets. As a case study, we present the dataset by Palpurina (2025) exported from Turboveg 2 (Hennekens and Schaminée 2001), the standard software for vegetation data management in Europe. For this dataset, we used an R script*1 to directly access the database and export the data as Darwin Core Archive (DwC-A) using an Event Core with Occurrence, Relevé and Humboldt extensions. Regarding the Humboldt extension, field mapping required one-to-many and many-to-one relationships between Turboveg and DwC. Because some Turboveg 2 fields lacked sufficient detail, the script had to include hardcoded values and manual enrichment steps to populate missing DwC fields and ensure schema compliance. Our talk (Suppl. material 1) addresses challenges in applying the Humboldt extension to vegetation-plot data, focusing on common practices for capturing key data critical for vegetation data interoperability.
- Research Article
- 10.3897/neobiota.104.157379
- Dec 19, 2025
- NeoBiota
- Eduardo Chacón-Madrigal + 9 more
Non-native plant species are on the rise globally, yet the distribution patterns and environmental drivers in biodiversity-rich regions such as Central America remain poorly understood. These species are affecting biodiversity, ecosystem integrity, and conservation efforts, especially when they become invasive. We analyzed the spatial distribution of 751 naturalized plant species using more than 42,000 records collected through the Global Biodiversity Information Facility (GBIF) across the seven countries in Central America. We evaluated the influence of environmental variables, human population density, protected areas, and life zones on both occurrence and species richness. Human population density appeared as the strongest predictor of naturalized species occurrence and richness, highlighting the role of human activity in promoting invasions. The annual mean temperature and biodiversity integrity were negatively associated with the occurrence and species richness. Tropical rainforests and other humid life zones have more naturalized species than expected by chance. Protected areas had fewer naturalized species overall, but a higher species: observation ratio, showing both conservation value and vulnerability. Naturalized rare species, in terms of the number of records, were found outside protected zones, particularly in disturbed and urbanized areas. Our findings highlight the need for early detection, targeted management, and strengthened protection strategies, especially in mid-elevation zones and densely populated areas. By identifying key environmental and anthropogenic drivers and the most affected regions, this study offers actionable insights for conservation planning and invasive species management in one of the world’s most biodiverse and socio-environmentally vulnerable regions.
- Research Article
- 10.1186/s12862-025-02455-y
- Dec 18, 2025
- BMC Ecology and Evolution
- Zhao Wanglin + 8 more
BackgroundThe White-browed Crake (Pololimnas cinereus, family: Rallidae, hereafter WbC) is a climate sensitive bird with a tropical/subtropical distribution in Southeast Asia, Australasia, and the Philippines. Range expansion into higher latitudes would be predicted for this species in a warming climate. In this study, we first photographed a WbC in a park of Motuo County on the southeast Tibetan Plateau. Then we compiled geographic data from the Global Biodiversity Information Facility (GBIF) to illustrate its distribution characteristics. We also used a MaxEnt model to simulate its global suitable range under different future climate change scenarios.ResultsThe results showed: (1) this observation constitutes a new distributional record of the WbC on the Tibetan Plateau. This expanded northern boundary (29°19′25.40″N) increased the latitudinal limit of the species by 171.58 km. (2) The coldest monthly minimum temperature, the wettest seasonal precipitation, and the human footprint index were the main environmental factors affecting the distribution of WbC, the rise in the coldest monthly minimum temperature has facilitated the expansion of the WbC's habitat. (3) Future climate warming will lead to a significant increase of suitable areas for WbC, with its distribution center shifting 196.11 km and 153.80 km towards northwest in 2041–2060 and 2081–2100, respectively. Under the scenarios for the 2041–2060 and 2081–2100, the globally suitable distribution range of the WbC might expand by 1,125,400 km² and 1,275,200 km², respectively. In China, the corresponding expansion was 27,500 km² and 29,200 km², respectively, mainly distributed in Guangdong, Yunnan, Taiwan, Guangxi, Hainan, Xizang, and Fujian provinces.ConclusionsThe WbC photographed in Motuo County is a new distribution record of this species on the Tibetan Plateau, with Motuo County in Xizang being the northernmost boundary of the current WbC range. The wettest seasonal precipitation, and the human footprint index were the main environmental factors affecting the distribution of the WbC. Under future climate change scenarios, the WbC's range is expanding rapidly, and tends to dispersal in a northwesterly direction.
- Research Article
- 10.3897/biss.9.182246
- Dec 17, 2025
- Biodiversity Information Science and Standards
- Esteban Marentes Herrera + 2 more
Colombia is a mega-biodiverse country, however the number of known species present in the country remains underestimated due to a lack of information, especially for understudied groups that are difficult to sample and identify, such as beetles (Coleoptera). It is not possible to obtain a precise number of species for this group using conventional methods, due to their high diversity and lack of resources for taxonomic research. For this reason, a modelling exercise was conducted using the Cross Industry Standard Process for Data Mining (CRISP-DM) (Chapman et al. 1999), with occurrence records published through SiB Colombia (Sistema de información sobre biodiversidad de Colombia) and GBIF (Global Biodiversity Information Facility). These records were labeled using checklists of various families and subfamilies compiled by experts of Coleoptera de Colombia, and enhanced by relevant abiotic measurements available in open repositories, which were identified as having an impact on the distribution of beetles. The best model was used to indirectly predict the number of beetle species in the Antioquia department, which has a large amount of available data and a variety of representative habitats. Occurrence data for the order Coleoptera were downloaded directly from GBIF as of March 2025 (GBIF 2025a), using a global filter for the order Coleoptera. This dataset contains 32,356,574 rows, which include information on taxonomy, locality, coordinates, date, publishing entity, collector and type of record. Occurrence records of plant species associated with beetles were obtained from GBIF by downloading all biological records of the kingdom Plantae for Colombia (GBIF 2025b) and were spatially intersected with the Coleoptera data using a buffer of 1000 m. The abiotic variables selected included wind speed, relative humidity, precipitation, solar radiation, temperature, ecosystems/habitats, elevation above sea level and their relevant information can be found at Herrera (2025). Finally these data were labeled with the number of species per family of Coleoptera in Colombia compiled from available checklists published in GBIF via SiB Colombia by the research group Coleoptera de Colombia*1 (Coleoptera de Colombia 2025). This data was used to annotate the final dataset and served as the dependent variable used in the regression models. Four commonly used regression techniques were selected (linear regression, multilayer perceptron, deep neural networks and Random Forest) after a literature review of the most common methods used for species prediction Herrera (2025). These were built in Python, fine-tuning them for optimal performance. Data was split into training and testing sets to ensure validity and the best model was selected based on standard accuracy metrics and confirmed using cross-validation. An escalated deep neural network was selected after getting the best values in the three metrics and was used to make predictions for the number of species per family in Antioquia and Colombia. The script with all the details is available in Suppl. material 1. For Antioquia, the model predicted a minimum of 2,007, a maximum of 9,381, and an average value of 4,210 species of beetles. Using the same model for Colombia, there was a minimum of 3,226, a maximum of 14,991 and an average value of 6,420 species predicted. These predictions were compared to the last published checklist, which listed 6,170 species recorded from Colombia and was consistent with the number of species described from the country since 1945. These predictions were shared with expert coleopterists, who qualitatively assessed the results. However, depending on the particular beetle family, experts thought the models may have overestimated or underestimated predicted values, highlighting the general uncertainty in species estimates and how artificial intelligence (AI) models are also unreliable in predicting a clear value, serving only as a starting point or providing a wide range to work with.
- Research Article
- 10.3897/biss.9.181272
- Dec 17, 2025
- Biodiversity Information Science and Standards
- Filipi Soares + 8 more
Modeling species names in biodiversity ontologies is particularly difficult in multilingual contexts, where semantic conflation often occurs. A good example is the common name "pimenta." In Brazilian Portuguese, experts usually refer to Capsicum spp. (chili peppers), while its direct translation “pepper” in English often denotes Piper nigrum (black pepper) (Soares et al. 2025a). In Brazilian markets, however, Piper nigrum is more accurately associated with “pimenta-do-reino" (“pimenta-negra”). This issue was observed on Wikipedia, when translating the Portuguese page for “pimenta” into English, the entry switches from Capsicum spp. to black pepper ( Piper nigrum ), showing how easily semantic drift can appear in multilingual data modeling. The correct association between common names used in the agricultural market with species would be a way to avoid the misunderstanding of these cultural differences. However, another challenge in vocabulary management emerges, which is how to manage species names in ontologies to keep them updated as the taxonomy itself updates. Some agriculturally controlled vocabularies, such as Agrotermos (Telles et al. 2024) lack automated mechanisms for updating taxonomic classifications. For example, Prochilodus cearensis , Prochilodus scrofa , and Prochilodus margravii are all listed in Agrotermos as preferred terms, i.e., the authorized, standard term selected to represent a concept in a controlled vocabulary, while according to the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy (GBIF Secretariat 2023) these names are synonyms, as shown in Table 1. When developing the Agricultural Product Types Ontology (APTO), which was designed to represent products traded in Brazilian agricultural markets based on Agrotermos and AGROVOC, we proposed two approaches using generative AI, specifically OpenAI's ChatGPT-4, as a semantic engineering assistant to automate the inclusion of scientific names in the ontology: Prompt-based queries with a plugin accessing the GBIF API A ChatGPT-generated Python script that converted GBIF taxonomy data into Web Ontology Language (OWL) format Prompt-based queries with a plugin accessing the GBIF API A ChatGPT-generated Python script that converted GBIF taxonomy data into Web Ontology Language (OWL) format These AI-supported methods automated the construction of APTO’s “Organism” module, integrating taxonomic hierarchies and managing synonyms. ChatGPT effectively identified synonymy (e.g., see Table 1) and reduced manual labor in ontology development. The first approach is no longer reproducible since OpenAI has replaced plugins by GPTs. As such, we are currently developing a GPT named Taxonomy OWLizer 2.0*1, which is an evolution of the first approach described in that paper. Concerns about scalability, reproducibility, and hallucinations (false, made-up information) remain, highlighting the need for expert oversight throughout the process. When ChatGPT was used without API access, hallucinations appeared more frequently. For instance, when asked to check a list of plant species names for typos, it incorrectly suggested that Euterpe edulis was a synonym of Euterpe oleracea , even though both are recognized as distinct species in widely used catalogues such as the GBIF Backbone Taxonomy (Soares et al. 2025a). This case study demonstrates that generative AI can support but not yet replace human-led ontology development. It also emphasizes AI’s potential contribution to biodiversity informatics, particularly for managing evolving and multilingual vocabularies. All tools and source code related to our work are archived on Zenodo (Soares et al. 2025b). Detailed protocols are provided in Soares et al. 2025a.
- Research Article
- 10.3897/biss.9.182085
- Dec 16, 2025
- Biodiversity Information Science and Standards
- Giulia Micai + 2 more
Eduard Hackel (1850–1926) was an Austrian botanist best known for his pioneering work on grasses (Poaceae), becoming one of the most famous and respected agrostologists of his time. In 2024, the botanical department of the Natural History Museum of Vienna received as a donation a collection of more than 700 letters sent to Eduard Hackel between 1870 and 1932 by botanists writing from 40 different countries, in 6 different languages. In botany, supplementary materials, such as the personal correspondence of scientists, field notes or diaries, hold a great potential, since they might contain the only documentation of a scientist’s thought processes, ideas and observations, filling in gaps on missing or poorly documented specimens and ideas. They may record information that is otherwise hard to deduce from the specimens alone, which can be vital for scientific research (Rinaldo et al. 2013). In order to make the information contained in these types of materials findable and searchable by anyone and from anywhere, it needs to be digitised and transcribed. Transcription projects are time consuming, intellectually intensive, and expensive for an organisation to set up and manage. Crowdsourcing is a sustainable strategy for transcribing large collections and enhancing descriptive metadata. Despite the increasingly prominent role of artificial intelligence, crowdsourcing is still an indispensable part of the future of libraries, museums, and archives because it solves problems, strengthens the sense of community between the users and the public, adds value to the collection, and creates engagement. The value of scientists’ private correspondence as an historical source has been recognised for a long time already, but recently the concept of "correspondence networks," which connect several people into webs of exchange, became distinct objects of historical studies themselves. (Ogilvie 2016). The transcription process started with the use of the AI-powered platform Transkribus, which enables the transcription of handwritten text into machine-readable and searchable texts. The challenges we faced in the transcription are intrinsic to the nature of our project: dealing with old, obsolete hand-written scripts (e.g., old Kurrent in German); letters written by different senders; and letters including different languages. In order to overcome these challenges, we had to use different transcription models for the AI-driven transcription, and rely on various volunteers with different linguistic backgrounds and skills. After the transcription, the letters were uploaded to Goobi Workflow, which is a tool for the management and display odigitisation projects. Transcribing the letters made it possible to extract relevant data from the texts about botanical specimens and their collectors and allowed the linkage of biographical data on Wikidata and Bionomia with specimen data on JACQ and GBIF (Global Biodiversity Information Facility). With the use of Transkribus and the help of our volunteers, we were able to transcribe 864 pages and identify 140 senders. Each one of the botanists was linked to their Q-number, which allows the linkage to their biographical data on Wikidata and Bionomia. In various letters, the sender included a botanical specimen and requested help from E. Hackel for its identification. In various cases, thanks to the information contained in the text, we have been able to identify the specimen within E. Hackel’s personal herbarium, deposited at the Natural History Museum of Vienna. Among the senders, we identified 17 Italian scientists. One of the publicly accessible outcomes of the investigation of Hackel’s exchange with the 17 Italians is the online exhibition on the Europeana platform in an English*1 and Italian*2 version. A second result was the curated story of E. Hackel and his 17 Italian correspondents by the Transforming European Taxonomy through Training, Research and Innovations (TETTRI) project in the form of a curated website*3, which links their biographical data from Wikipedia, Wikidata, and Bionomia, and visuals from Wikimedia. Furthermore, the letters are publicly available for consultation together with their transcription*4 via Trankribus*5 sites. This contribution will provide an insight into the possibilities of extracting data from natural history collections and interlinking them to implement their usability and accessibility for research institutions and the general public.
- Research Article
- 10.3897/biss.9.181877
- Dec 10, 2025
- Biodiversity Information Science and Standards
- Elspeth Haston + 6 more
The digitisation of the world’s natural science collections is expanding massively and providing a unique global resource for answering some of the most fundamental bio- and geodiversity questions. However, digitisation at this scale can only be done in stages, increasing the variation in the level of digitisation both between and within collections. The ability to measure and monitor the level of digitisation of each individual specimen and, by extension, each collection on a national or global scale has never been more important. The Minimum Information about a Digital Specimen (MIDS) standard is being developed to provide an international digitisation standard within the Biodiversity Information Standards (TDWG) organisation. The standard, along with implementations which can calculate the MIDS level of specimens and, by extension, datasets, provide users with tools to help develop a digitisation strategy as well as plan, manage and monitor a digitisation programme, including prioritisation and data enhancement. For researchers, whilst the MIDS level of published specimens does not provide information about the quality of the data present, it does indicate the expected amount of associated data for intended analyses, including geographic coordinates and identifiers. The MIDS website provides access to the current draft of the standard. The four MIDS levels (MIDS0 to MIDS3) are described, and for each level the purpose is included. The purpose has been key to defining the information that is required to be present. The information recorded for each specimen is categorised into information elements. A detailed schema for the information elements includes the label, definition, usage note, purpose and examples as well as the disciplines for which each element is required (Biology, Geology, Palaeontology). The information elements required for each MIDS level are cumulative, with each level adding additional information relevant for the purpose of the level. Specimen data recorded in collection databases and submitted to international aggregators such as the Global Biodiversity Information Facility (GBIF) need to be mapped to the information elements to enable the calculation of the MIDS level. The website explains how the Simple Standard for Sharing Ontology Mappings (SSSOM) (Matentzoglu et al. 2022) is being used to map Darwin Core (DwC) (Wieczorek et al. 2012)and Access to Biological Collection Data (ABCD) (Access to Biological Collection Data task group 2007) terms to MIDS. It provides a tabulated quick reference of MIDS mappings with a filter option to enable users to quickly review the mapping by MIDS level or by information element. Several tools have been developed, implementing MIDS to calculate the MIDS levels of datasets, and the website provides links to these. As the MIDS standard is not yet ratified, there have been updates which are not reflected in all the tools currently available. There is a reference implementation as part of the open Digital Specimen model with the Distributed System of Scientific Collections (DiSSCo) where the MIDS level is calculated for each digital specimen. The MIDS Calculator can be used to calculate the MIDS score for DwC archive and ABCD biological datasets based on the current version of MIDS. Additional functionality for geological and palaeontological datasets is being developed. As the MIDS standard approaches public review we encourage curators and collection managers to test out the functionality on their collection data and provide feedback using the MIDS GitHub repository.
- Research Article
- 10.53516/ajfr.1779655
- Dec 10, 2025
- Anadolu Orman Araştırmaları Dergisi
- Ayşegül Tekeş Düdükçü
Background and Aims This study aim to predict the current and future potential distribution areas of Anemone coronaria L., an important geophyte species naturally distributed in Türkiye. Methods The species’ potential distribution modelling and mapping were conducted using the MaxEnt (Maximum Entropy) method. Current climate data were obtained from the WorldClim database, while future projections for the year 2100 were derived from the UKESM1-0LL global climate model outputs under four Shared Socioeconomic Pathways (SSP) scenarios. Species occurrence records were sourced from the Global Biodiversity Information Facility (GBIF) database. Results The modelling results indicated that the Area Under the Curve (AUC) values were 0.938 for the training dataset and 0.933 for the test dataset, classifying the model’s performance as “excellent.” The environmental variables on the species’ potential distribution were Annual Precipitation (BIO12), Precipitation of the Warmest Quarter (BIO18), Temperature Annual Range (BIO7), ruggedness, and elevation. Currently, A. coronaria shows high habitat suitability concentrated along the coastal zones of the Aegean, Mediterranean, and southern Marmara regions. Future projections suggest varying degrees of habitat contraction depending on the severity of climate change. Conclusions A. coronaria is highly sensitive to climate change and that future habitat contraction could adversely affect its ecological, aesthetic, and cultural values. This study contributes to the sustainable management of A. coronaria and provides a model framework for the conservation of other climate-sensitive Mediterranean flora.
- Research Article
1
- 10.1073/pnas.2519119122
- Dec 9, 2025
- Proceedings of the National Academy of Sciences
- Dirk Steinke + 5 more
Here, we present an analysis of the growth and use of the Global Biodiversity Information Facility (GBIF) over the last 5 y. GBIF is the world's largest data integrator for biodiversity information and plays a central role in research across the biodiversity and evolutionary science community. With the help of a comprehensive bibliographic dataset comprising 12,193 studies that used GBIF-mediated data, we demonstrate how the global scientific community utilizes the continuously fast-growing amount of open and Findable, Accessible, Interoperable, and Reusable biodiversity data in their research. Overall, more researchers engage with GBIF data, a potential consequence of the rising demands of more global environmental assessments, where GBIF-mediated data are being used as a key resource for biodiversity research. Studies utilizing species distribution modeling were most prevalent and data used for topics related to challenges of the Anthropocene (conservation, climate change, invasive, and pest species). More studies used observational data records, a category that also includes a substantial amount of citizen science data. Our data show that a thematic diversification of GBIF-using literature is accompanied by a rapid diversification of both the additional datasets that GBIF data are analyzed with, as well as the new analytical approaches taken by researchers. This emphasizes the growing importance of GBIF's data infrastructure and services which support global sciences and reflect major shifts in applied science which dictate the need for GBIF and similar data infrastructures to evolve rapidly in order to maintain relevance for research.