Big Data for Beginners

Pieter Huybrechts

doi:10.3897/biss.7.111301

Abstract

With the increasing amount of datasets being published and made available through global aggregators, such as the Global Biodiversity Information Facility (GBIF), new opportunities have opened to answer research questions that previously could not be considered. Techniques for large scale data integration offer benefits for the biodiversity research community (Heberling et al. 2021, Kays et al. 2020), profiting from the great and continuing efforts in data mobilisation and standardisation (such as Darwin Core, Wieczorek et al. 2012). These benefits include integrating several large data sources or enriching existing occurrence data with other information. Several commonly encountered barriers to large-scale use of biodiversity occurrence data exist. These include the lack of facilities for local storage of large and rapidly changing datasets, the computational power required for processing, unfamiliarity with existing toolsets, and insufficient resources to maintain big data infrastructure. These challenges are well documented in the context of high-throughput genomics (Marx 2013), and more recently in occurrence-based biodiversity research (for example Thessen et al. 2018). However, while these hurdles and bottlenecks are very real, several of them have low cost of entry solutions. The aim of this presentation is to encourage the community to explore ambitious queries, to combine and examine all available data in its totality and to break down specific technical barriers, by providing a practical overview for researchers to maximise the power of large-scale data processing in their work. While big data processing may seem daunting, tools accessible to users without a background in big data are available for both local workstations and cloud computing services that allow for scalable data processing at low cost, for instance Databricks Community Edition or Apache Arrow. Using these resources, researchers can incorporate larger datasets into existing protocols, and by doing so, uncover patterns and insights that would be otherwise impossible to acquire using smaller subsets of the ever-expanding complex set that biodiversity occurrence data presents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Big Data for Beginners

Abstract

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards

Lead the way for us

Journal: Biodiversity Information Science and Standards	Publication Date: Aug 18, 2023
License type: CC BY 4.0

Similar Papers

GBIF Data Processing and Validation
John Waller ... Nikolay Volik
Biodiversity Information Science and Standards | VOL. 5
John Waller, et. al.John Waller ... Nikolay Volik
27 Sep 2021
Biodiversity Information Science and Standards | VOL. 5

The problem of analysis of big web data and the use of data mining technology for processing and searching patterns in big web data on a practical example
K V Mulyukova ... V M Kureichik
Open Education | VOL. 23
K V Mulyukova, et. al.K V Mulyukova ... V M Kureichik
14 May 2019
Open Education | VOL. 23

An Australian Model of Cooperative Data Publishing to OBIS and GBIF
Katherine Tattersall ... Mahmoud Sadeghi
Biodiversity Information Science and Standards | VOL. 7
Katherine Tattersall, et. al.Katherine Tattersall ... Mahmoud Sadeghi
07 Sep 2023
Biodiversity Information Science and Standards | VOL. 7

Big data processing and analysis platform for condition monitoring of electric power system
Yuanjun Guo ... Yong Wang
-
Yuanjun Guo, et. al.Yuanjun Guo ... Yong Wang
01 Aug 2016
01 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Big Data for Beginners

Abstract

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards