Abstract

There is a wealth of information about the characteristics (traits) of organisms within collections of taxonomic descriptions of plants and animals called a ‘Flora’ or ‘Fauna’ of a region. However, such knowledge is usually encoded as text paragraphs, and is thus unavailable for immediate analysis. In order to make use of the knowledge embedded in taxon descriptions, text must be organised into standardised, queryable datasets. Despite the recent development of natural language processing (NPL) tools to analyse taxonomic descriptions to extract trait values, the complexity and specificity of these methods currently limits broad application. Accessible and flexible methods for extracting traits across large numbers of taxonomic descriptions are therefore needed. Here we present such an R-based workflow, which can be adapted for use on any organismal group using a language familiar to researchers in the biological sciences. We document a way to (1) assemble tens of thousands of taxonomic descriptions into a standardised format, (2) split the taxon descriptions into different topics, (3) extract trait values as defined by the user, and (4) assign traits described at the genus and family level to lower level taxa to maximise trait coverage. As a case study, we apply the workflow to a collection of taxonomic descriptions drawn from Australia's state and national floras and describe useful techniques for creating workflows and thereby research-grade trait datasets. Using this method, we were able to extract 615,812 trait values from 38 different plant traits. Trait data collated using this method are freely available as part of the AusTraits trait database and have already contributed to analyses in several scientific publications.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.