Abstract

With the construction and deployment of seafloor observatories around the world, massive amounts of oceanographic measurement data were gathered and transmitted to data centers. The increase in the amount of observed data not only provides support for marine scientific research but also raises the requirements for data quality control, as scientists must ensure that their research outcomes come from high-quality data. In this paper, we first analyzed and defined data quality problems occurring in the East China Sea Seafloor Observatory System (ECSSOS). We then proposed a method to detect and repair the data quality problems of seafloor observatories. Incorporating data statistics and expert knowledge from domain specialists, the proposed method consists of three parts: a general pretest to preprocess data and provide a router for further processing, data outlier detection methods to label suspect data points, and a data interpolation method to fill up missing and suspect data. The autoregressive integrated moving average (ARIMA) model was improved and applied to seafloor observatory data quality control by using a sliding window and cleaning the input modeling data. Furthermore, a quality control flag system was also proposed and applied to describe data quality control results and processing procedure information. The real observed data in ECSSOS were used to implement and test the proposed method. The results demonstrated that the proposed method performed effectively at detecting and repairing data quality problems for seafloor observatory data.

Highlights

  • Seafloor observatories, a universally recognized third observation platform for humans, have become the most remarkable trend in international marine science and technology [1]

  • We extended its application to seafloor observatory data, and improvements were added to the autoregressive integrated moving average (ARIMA) model by using a sliding window and cleaning modeling data

  • The ARIMA method balances precision and recall and obtains a fairly high F1 score in the end, which indicates the effectiveness for outlier detection in pH data by using the ARIMA method

Read more

Summary

Introduction

A universally recognized third observation platform for humans, have become the most remarkable trend in international marine science and technology [1]. Designed with all marine equipment under the sea, cabled seafloor observatories use submarine cables to provide power and transmit information between underwater instruments and shore-based stations. This setup permits the acquisition of long-term, real-time, continuous, high-resolution, and numerous data from in situ instruments [2]. Data collected from seafloor observatories have been providing powerful insights into complex oceanographic processes and are widely used in scientific research such as the geo-, bio-, and hydrosphere interactions and their evolution and variability through time [3]. Automated methods for rapidly identifying and correcting problematic data are essential [4]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call