Abstract

Some information contained in historical topographic maps has yet to be captured digitally, which limits the ability to automatically query such data. For example, U.S. Geological Survey’s historical topographic map collection (HTMC) displays millions of spot elevations at locations that were carefully chosen to best represent the terrain at the time. Although research has attempted to reproduce these data points, it has proven inadequate to automatically detect and recognize spot elevations in the HTMC. We propose a deep learning workflow pretrained using large benchmark text datasets. To these datasets we add manually crafted training image/label pairs, and test how many are required to improve prediction accuracy. We find that the initial model, pretrained solely with benchmark data, fails to predict any HTMC spot elevations correctly, whereas the addition of just 50 custom image/label pairs increases the predictive ability by ∼50%, and the inclusion of 350 data pairs increased performance by ∼80%. Data augmentation in the form of rotation, scaling, and translation (offset) expanded the size and diversity of the training dataset and vastly improved recognition accuracy up to ∼95%. Visualization methods, such as heat map generation and salient feature detection, can be used to better understand why some predictions fail.

Highlights

  • Spot elevations were depicted on historical maps to improve the reader’s interpretation of the terrain, assist the terrain representation shown by contours, indicate points of interest, and, in the case of those at summits and passes, assist aviators when navigating, and are not depicted on modern vectorized topographic maps in the United States today (Arundel and Sinha 2020)

  • This paper reports progress toward using deep learning optical character recognition (OCR) to detect and recognize spot elevations on U.S Geological Survey (USGS) historical topographic map images and discusses how this technology can be further applied to extract other information from map content

  • Our experiments demonstrate that deep learning OCR is a viable method for map text interpretation

Read more

Summary

Introduction

Spot elevations were depicted on historical maps to improve the reader’s interpretation of the terrain, assist the terrain representation shown by contours, indicate points of interest, and, in the case of those at summits and passes, assist aviators when navigating, and are not depicted on modern (digital) vectorized topographic maps in the United States today (Arundel and Sinha 2020) To remedy this absence, research on spot elevations/heights has concentrated on automating techniques to choose appropriate features from the myriad of elevational peaks for cartographic display on topographic maps because their manual selection and generalization is both expensive and time consuming (Baella et al, 2007; Palomar-Vázquez and Pardo-Pascual 2008). Extracting specific features from topographic maps not captured as theme geodatabases is a challenging, enduring effort (Pezeshk 2011; Ganpatrao and Ghosh 2014; Arundel et al, 2020; Li et al, 2020; Shbita et al, 2020)

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call