Abstract

iEcology is used to supplement traditional ecological data by sourcing large quantities of media from the internet. Images and their metadata are widely available online and can provide information on species occurrence, behaviour and visible traits. However, this data is inherently noisy and data quality varies significantly between sources. Many iEcology studies utilise data from a single source for simplicity and efficiency. Hence, a tool to compare the suitability of different media sources in addressing a particular research question is needed.We provide a simple, novel way to estimate the fraction of images within multiple unverified datasets that potentially depict a specified target fauna. Our method, the Sum of Tag Frequency Differences (STFD), uses any pretrained, general-purpose image classifier. One of the method's innovations is that it does not require training the classifier to recognise the target fauna. Instead, STFD analyses the frequency of the generic text-tags returned by a classifier for multiple datasets and compares them to the corresponding frequencies of an authoritative image dataset that depicts only the target organism. From this comparison, STFD allows us to deduce the fraction of images of the target in unverified datasets.To validate the STFD approach, we processed images from five sources: Flickr, iNaturalist, Instagram, Reddit and Twitter. For each media source, we conducted an STFD analysis of three fauna invasive to Australia: Cane toads (Rhinella marina), German wasps (Vespula germanica), and the higher-level colloquial taxonomic classification, “wild rabbits”. We found the STFD provided an accurate assessment of image source relevance across all data sources and target organisms. This was demonstrated by the consistent, very strong correlation (toads r ≥0.97, wasps r ≥0.95, wild rabbits≥ 0.95) between STFD predictions, and the fraction of target images in a source dataset observed by a human expert.The STFD provides a low-cost, simple and accurate comparison of the relevance of online image sources to specific fauna for iEcology applications. It does not require expertise in machine learning or training neural-network species-specific classifiers. The method enables researchers to assess multiple image sources to select those warranting detailed investigation for the development of tools for web-scraping, citizen science campaigns, further monitoring or analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.