Abstract

Automated monitoring of websites that trade wildlife is increasingly necessary to inform conservation and biosecurity efforts. However, e-commerce and wildlife trading websites can contain a vast number of advertisements, an unknown proportion of which may be irrelevant to researchers and practitioners. Given that many wildlife-trade advertisements have an unstructured text format, automated identification of relevant listings has not traditionally been possible, nor attempted. Other scientific disciplines have solved similar problems using machine learning and natural language processing models, such as text classifiers. Here, we test the ability of a suite of text classifiers to extract relevant advertisements from wildlife trade occurring on the Internet. We collected data from an Australian classifieds website where people can post advertisements of their pet birds (n = 16.5k advertisements). We found that text classifiers can predict, with a high degree of accuracy, which listings are relevant (ROC AUC ≥ 0.98, F1 score ≥ 0.77). Furthermore, in an attempt to answer the question 'how much data is required to have an adequately performing model?', we conducted a sensitivity analysis by simulating decreases in sample sizes to measure the subsequent change in model performance. From our sensitivity analysis, we found that text classifiers required a minimum sample size of 33% (c. 5.5k listings) to accurately identify relevant listings (for our dataset), providing a reference point for future applications of this sort. Our results suggest that text classification is a viable tool that can be applied to the online trade of wildlife to reduce time dedicated to data cleaning. However, the success of text classifiers will vary depending on the advertisements and websites, and will therefore be context dependent. Further work to integrate other machine learning tools, such as image classification, may provide better predictive abilities in the context of streamlining data processing for wildlife trade related online data.

Highlights

  • The global wildlife trade is a major concern for biodiversity conservation and biosecurity enforcement [1]

  • Text classification can be a highly accurate method to extract relevant listings of wildlife found on the Internet

  • Text classification models are commonplace in other disciplines and industries, which work heavily with text data (e.g., [11]), yet have not been applied to data collected on the wildlife trade occurring on the Internet

Read more

Summary

Introduction

The global wildlife trade is a major concern for biodiversity conservation and biosecurity enforcement [1]. Data gathered from the Internet are typically not immediately ready for analysis (i.e., they are ‘messy’) and must be cleaned or processed to identify the desired attributes for subsequent analysis [6] This is especially true for classifieds, forums, and social media sites where human users type their advertisements into an open (or ‘free form’) text box. A useful but unexplored application is to predict and extract relevant online listings based on their text, which could save time in manual data processing steps if many irrelevant listings exist in the dataset. In the context of wildlife-trade data derived from the Internet, text classification models have the potential to identify relevant listings and remove irrelevant listings that do not sell wildlife (i.e., fish tanks, bird cages, food) by using the words in the listings. We examine the efficacy of text classification models in predicting the relevance of wildlife trade advertisements on the Internet. Our results imply that text classification can be an incredibly useful time-saver when cleaning data on the wildlife trade, which is structurally (textually) similar to the data we explore here

Materials and methods
Results
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call