Abstract
Marine image analysis faces a multitude of challenges: data set size easily reaches Terabyte-scale; the underwater visual signal often is impaired to the point where information content becomes negligible; interpreters are scarce and can only focus on subsets of the available data due to the annotation effort involved. Solutions to speed-up the analysis process have been presented in the form of semi-automation with artificial intelligence methods like machine learning. But the algorithms employed to automate the analysis commonly rely on large-scale compute infrastructure. So far, such an infrastructure has only been available on-shore. Here, a mobile compute cluster is presented to bring big image data analysis capabilities out to sea. The Sea-going High-Performance Compute Cluster (SHiPCC) units are mobile, robustly designed to operate with impure ship-based power supplies and based on off-the-shelf computer hardware. Each unit comprises of up to eight compute nodes with graphics processing units for efficient image analysis and an internal storage to manage the big image data sets. The first SHiPCC unit has been successfully deployed at sea. It allowed to extract semantic and quantitative information from a Terabyte-sized image data set within 1.5 hours (a relative speedup of 97\% compared to a single four-core CPU computer). Enabling such compute capability out at sea allows to include image-derived information into the cruise research plan, for example by determining promising sampling locations. The SHiPCC units are envisioned to generally improve the relevance and importance of optical imagery for marine science.
Highlights
Data science is becoming more important in many research domains and marine science is no exception
Three units of the compute cluster are available for deployment at sea
Two JAGO dives with BubbleBox deployments provided ca. 1 TB of gray scale imagery representing eight individual bubble streams
Summary
Data science is becoming more important in many research domains and marine science is no exception. Larger data sets at Gigabyte to multi-Terabyte-scale can be analyzed more efficiently by clusters of computers (Beloglazov et al, 2012). Those clusters can apply a selected algorithm to multiple data items in parallel by distributing the workload onto many compute nodes. Such clusters are usually operated by central computing centers of research institutes. They are commonly stationary, mounted in 19′′ racks, cooled and may consist of tens of thousands of compute nodes. Their individual units are heavy and rely on a consistent
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have