GIScience 2016 Short Paper Proceedings Multi-resolution, pattern-based segmentation of very large raster datasets J. Jasiewicz 1,2 , J. Niesterowicz 1 , T. F. Stepinski 1 Space Infromatics Lab, University of Cincinnati, 401 Braunstein Hall, Cincinnati, OH 45221-0131, US Email: {niestejk; stepintz}@mail.uc.edu Instititute of Geoecology and Geoinformation, Adam Mickiewicz University, Dziegielowa 27, Poznan, Poland Email: jarekj@amu.edu.pl Abstract We present an algorithm which efficiently segments very large categorical rasters based on patterns of their categories. It operates on a grid of motifels – square blocks of raster cells representing a local pattern. Our algorithm is based on the seeded region growing principle but it uses a novel grid topology and seeds stack with individual thresholds. It has a single free parameter – the spatial scale of a pattern. Algorithm was proven to be robust on land cover data, topographic landforms data, and high resolution color-quantized RGB images. We present a multi-scaled segmentation of NLCD2011 as an example. Potential applications of the new algorithm include ecology, geomorphology, pedology, forestry, agriculture, and urban studies. 1. Introduction Segmentation is the process of partitioning a raster dataset into multiple homogeneous segments. The goal of segmentation is to spatially generalize a raster so it provides more insight and is easier to analyze. The bulk of the existing work (Zhang et al. 2008) has focused on segmentation of images of relatively small scenes. However, segmentation of datasets that originated from remote sensing and cover large, continental or even global-scale areas, are also important, but existing segmentation algorithms are ineffective for such large datasets and the custom algorithms are lacking. Examples of such datasets are the National Land Cover Dataset (NLCD), which covers the conterminous US (CONUS) with the resolution of 30 m, or the SRTM-based DEM, which has a world-wide extent at 90 m resolution. Segmentation of NLCD could yield landscape types – precursors to ecoregions, and segmentation of world-wide DEM could delineate physiographic regions. In this paper we describe a segmentation algorithm especially designed for very large rasters. Specificities of such datastes are as follows. (1) They are the mosaics of multiple datasets, thus it is better to segment a secondary product of uniform quality (for example, a land cover) rather than a montage of primary data of variable quality (for example, a montage of Landsat scenes). (2) They are large; for example, the NLCD consists of ~8 billion cells and has the size of ~16 GB. (3) The goal of the segmentation is to identify regions characterized by patterns which are homogeneous on the scale that is large in comparison to the resolution of the raster, since the need for pattern-based segmentation. To deal with a large size of the input the proposed algorithm is based on the concept of Complex Object-Based Image Analysis (COBIA) (Vatsavai 2013, Stepinski et al. 2015). In COBIA the