Segmentation of Geophysical Data: A Big Data Friendly Approach

David R.B Stockwell,Ligang Zhang,Brijesh Verma

doi:10.1016/j.procs.2015.07.277

Abstract

Abstract A new scalable segmentation algorithm is proposed in this paper for the forensic determination of level shifts in geophysical time series. While a number of segmentation algorithms exist, they are generally not ‘big data friendly’ due either to quadratic scaling of computation time in the length of the series N or subjective penalty parameters. The proposed algorithm is called SumSeg as it collects a table of potential break points via iterative ternary splits on the extreme values of the scaled partial sums of the data. It then filters the break points on their statistical significance and peak shape. Our algorithm is linear in N and logarithmic in the number of breaks B, while returning a flexible nested segmentation model that can be objectively evaluated using the area under the receiver operator curve (AUC). We demonstrate the comparative performance of SumSeg against three other algorithms. SumSeg is available as an R package from the development site at http://github.com/davids99us/anomaly.

Highlights

Has the level of a time series changed due to natural variation or an external influence? Abrupt changes in level can be due to instrument faults or reconfiguration and so are necessary for QA/QC on data from weather stations [1] and automatic tide or stream level gauges
The computation time of SumSeg is linear in length of the series and logarithmic in number of breaks
It outputs a set of possible break points and their statistical significance, which could be evaluated and optimized using the receiver operating curve and area under the receiver operator curve (AUC) value

Summary

Introduction

Has the level of a time series changed due to natural variation or an external influence? Abrupt changes in level can be due to instrument faults or reconfiguration and so are necessary for QA/QC on data from weather stations [1] and automatic tide or stream level gauges. The level changes in a segmentation model may represent gene expression in micro-array comparative genomic hybridization data [2], regime shifts in climate data [3], breakouts in stock prices, twitter or web service logs, or features of interest in weak machine learning classifiers [4]. Linear or better order of increase in the computational cost of data length N and number of breaks B. This paper has three major contributions: 1) a novel ternary split segmentation algorithm available as a R package based on minimum and maximum extrema of the partial sums; 2) identification of linearity in length of data and number of breaks as crucial computational criteria for scaling segmentation; 3) use of the familiar learning statistical metric of the AUC as the criterion for breakpoints

Related Work

Proposed Algorithm

Experiments

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Segmentation of Geophysical Data: A Big Data Friendly Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Procedia Computer Science

Lead the way for us

Journal: Procedia Computer Science	Publication Date: Jan 1, 2015
License type: cc-by

Similar Papers

Iterative regression analysis of periodicities in geophysical record time series
N R Rigozo ... D J R Nordemann
Revista Brasileira de Geofísica | VOL. 16
N R Rigozo, et. al.N R Rigozo ... D J R Nordemann
01 Jul 1998
Revista Brasileira de Geofísica | VOL. 16

Statistical Shape Modeling from Gaussian Distributed Incomplete Data for Image Segmentation
Ma Jingting ... Lin Feng
-
Ma Jingting, et. al.Ma Jingting ... Lin Feng
01 Jan 2015
01 Jan 2015

Automated liver segmentation from a postmortem CT scan based on a statistical shape model.
Atsushi Saito ... Seiji Yamamoto
International Journal of Computer Assisted Radiology and Surgery | VOL. 12
Atsushi Saito, et. al.Atsushi Saito ... Seiji Yamamoto
22 Sep 2016
International Journal of Computer Assisted Radiology and Surgery | VOL. 12

Pathology Hinting as the Combination of Automatic Segmentation with a Statistical Shape Model
Pascal A Dufour ... Jens Kowal
-
Pascal A Dufour, et. al.Pascal A Dufour ... Jens Kowal
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Segmentation of Geophysical Data: A Big Data Friendly Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Procedia Computer Science