Abstract

‘Big data’ are becoming common in biological oceanography with the advent of sampling technologies that can generate multiple, high-frequency data streams. Given the need for ‘big’ data in ocean health assessments and ecosystem management, identifying and implementing robust and efficient processing approaches is a challenge for marine scientists. Using a large plankton imagery data set, we present two crowd-sourcing approaches applied to the problem of classifying millions of organisms. The first used traditional crowd-sourcing by asking the public to identify plankton through a web-interface. The second challenged the data science community to develop algorithms via an industry partnership. We found traditional crowd-sourcing was an excellent way to engage and educate the public while crowd-sourcing data scientists rapidly generated multiple, effective solutions. As the need to process and visualize large and complex marine data sets is expected to grow over time, effective collaborations between oceanographers and computer and data scientists will become increasingly important.

Highlights

  • Using a large plankton imagery data set, we present two crowd-sourcing approaches applied to the problem of classifying millions of organisms

  • While novel analytical techniques such as machine learning and crowd-sourcing for processing large and complex ecological data sets are increasingly reported in the terrestrial literature (Kelling et al, 2013; Peters et al, 2014), marine examples are limited (Wiley et al, 2003; Dugan et al, 2013; Millie et al, 2013; Shamir et al, 2014). Given this paucity and the need to use “big” biological oceanography and marine ecology data for rapid assessment of ocean health and adaptive management of ecosystems, we present here an evolution of approaches applied to the problem of efficiently classifying tens of millions of images of individual plankters generated by In Situ Ichthyoplankton Imaging System (ISIIS)

  • We found that traditional crowd-sourcing was an excellent way to engage and educate a broad spectrum of the public, while simultaneously applying human capital to a laborintensive task

Read more

Summary

Introduction

Given the need for “big” data in ocean health assessments and ecosystem management, identifying and implementing robust, and efficient processing approaches is a challenge for marine scientists. Using a large plankton imagery data set, we present two crowd-sourcing approaches applied to the problem of classifying millions of organisms. A key indicator of this shift to “big data” in biological oceanography is the precipitous rise in dataset size and complexity as a result of increased spatial, temporal, and taxonomic resolution, and increased rates of data generation (Table S1; Figure S1).

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call