Incomplete species lists derived from global and regional specimen‐record databases affect macroecological analyses: A case study on the vascular plants of China

Hong Qian,Cui Xiao,Jan Beck,Yi Jin,Hang Sun,Tao Deng,Keping Ma

doi:10.1111/jbi.13462

Abstract

AbstractAimOnline species distribution data, global or regional, are increasingly used in biodiversity studies although data limitations have been reported. We explore, for the two major databanks in our research region, how incomplete data are at a large taxonomic and geographic scale, how it is affected by spatial grain, and to what degree it affects inference from analyses of richness or turnover and the environment.LocationChina.Major taxa studiedVascular plants.MethodsWe assembled species lists of all vascular plants from the Global Biodiversity Information Facility (GBIF) and the National Specimen Information Infrastructure (NSII) at three spatial scales (national, provincial, county). We determined the completeness of each compilation by comparing the number of species with that from inventory‐based species list (for 28 provinces, 14 counties, and 146 nature reserves within counties). We related richness from each of the data sources (GBIF, NSII, inventory) to environmental variables (temperature, precipitation, elevational range) and compared regression models among the three data sources. We quantified floristic similarity between regions based on the three data sources and related species turnover to geographic and environmental distances.ResultsData incompleteness was prevalent at all spatial grains, but it increased with decreasing grain and it was higher for GBIF than for NSII. At the national scale, GBIF included 64.1% and NSII included 89.4% of true species richness in China. At the county scale these figures dropped to an average of 12.7% for GBIF and 60.0% for NSII. This changed the order and significance of environmental determinants of richness in regression models. The relationship between floristic similarity and geographic or environmental distance was shallower for GBIF and NSII, compared to inventory data. When the GBIF data were supplemented with the NSII data, the data completeness of GBIF increased from 64.1% to 90.8% at the national scale, from 37.2% to 89.0% at the province scale, and from 12.7% to 63.0% at the county scale.Main conclusionsMain specimen‐record databases are incomplete, which has the potential to heavily affect ecological patterns and mechanisms inferred from these data. Biodiversity analyses based on raw species lists from such sources should be viewed with utmost caution.

Full Text