Abstract

Abstract. Spatial vector data is a kind of data that represents real spatial information through points, lines and polygons. Spatial data quality is one of the basic theoretical research in geographic information science. Accurate and reliable data quality assessment is very important for its theoretical significance and practical value. This paper proposes an improved method for the traditional classification accuracy evaluation of spatial vector data: (1) Quantitative estimation of sample size. According to the statistical principle of probability theory, the overall quantity is estimated by controlling the sampling error and the acceptance quality level. The sample quality is the unbiased estimate of the overall quality. (2) Stratification strategy: the overall objects are divided into three layers according to the three basic geometric structures -- points, lines and polygons. The difference within the layer is small and the difference between layers is large, which conforms to the basic principle of stratification. Then, the proportion of the total number of elements in each layer is taken as the weight to distribute layer by layer, and the sample size of each layer is obtained. (3) Allocation of samples. The spatial property of spatial sampling is mainly reflected in the allocation of samples. Considering the spatial correlation of elements in same layer, Local Moran's I index was used to calculate the correlation degree of a certain attribute between each spatial element and its neighbouring elements. After cluster analysis of elements in each layer, samples were screened by setting a reasonable threshold value. (4) Sample inspection. Each sample was examined against reference information, including images and data. The classification of each sample is judged by the principle of majority judgment. (5) Classification accuracy assessment. The classification accuracy information of samples was obtained by making the confusion matrix of the classification result of samples and the real results. The classification accuracy of experimental data is evaluated according to Kappa index. A case study of Global Core Vector Data of Japan shows the improved method in this paper and process of classification accuracy assessment for regional spatial vector data product. Global Core Vector Data are organized according to the country or region, including three categories of transportation, river system, place names, which are divided into 8 middle categories and 52 small categories. In this paper, 1405 samples of Global Core Vector Data in the experimental area of Japan are selected by spatial stratified sampling in 3 strata. The experimental results show that the proposed improved method is applicable to classification accuracy assessment of regional spatial vector data product and overcomes the disadvantages of type-based spatial stratified sampling that relies on the classification information of all elements. The Kappa coefficient is 0.831, which reflects the result of classification accuracy assessment in the experimental area is good. The proposed improved method provides a reference for the method of classification accuracy assessment classification of following global spatial vector data product.

Highlights

  • Geospatial database is a database of geographic data and information, such as countries, cities, natural landscape, cultural landscape and related information (Donath, M., et al, 2006)

  • The Global Core Vector Data (GCVD) produced in China is analysed through the characteristics of vector data, and some regions in Japan are taken as examples for experiments

  • In consideration of the characteristics of spatial data and the scientific probability and statistics method, the sampling method, sample layout method and determination of sample size are improved, and a classification accuracy assessment method of regional spatial sampling based on geometric structure is proposed

Read more

Summary

INTRODUCTION

Geospatial database is a database of geographic data and information, such as countries, cities, natural landscape, cultural landscape and related information (Donath, M., et al, 2006). There are a lot of geospatial vector database around the world, such as The National Register Information System (NRIS), OpenStreetMap (OSM), The European Soil Database, Geo-Names Data and so on. These databases are only specific to a certain application field or an area, but lack of coverage and universality. In terms of sample size, the existing sampling method mainly uses expert experience or fixed value, which lacks reasonable and effective estimation model. It makes the sample size generally low.

Sampling principles
Sampling Method
Determination of sample size
Allocation of samples
EMPIRICAL CASE STUDIES
Experimental data for accuracy assessment
Experimental procedure for precision evaluation
Analysis of experimental results
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call