Intelligent map analysis is an important yet challenging topic. Recently, the development of large models, especially Visual Language Models (VLMs), has shown potential for intelligent image analysis. However, these models are primarily trained on natural images, which have intrinsic differences from maps. Consequently, there remains a gap in applying existing general-domain VLMs to map analysis. To address this issue, we propose a framework for developing a specialized VLM, called MapReader. To achieve this goal, a comprehensive data resource is collected using a strategy that combines self-instruct with expert refinement, including training data (MapTrain: 2,000 pairs of maps and descriptions) and evaluation data (MapEval: 250 maps and 500 map-related questions). Based on the training data, MapReader is fine-tuned on top of a general-domain VLM to learn to understand and describe map contents. The evaluation results on MapEval suggest that: (1) MapReader can accept map inputs and generate detailed descriptions of core geographic information, and it also possesses visual question-answering capabilities, showing potential for application in various map analysis scenarios, such as accessible map reading and robotic map usage; (2) The proposed data collection strategy is effective, and the collected dataset can serve as a benchmark to promote further map analysis research.
Read full abstract