The conventional approach to image recognition has been based on raster graphics, which can suffer from aliasing and information loss when scaled up or down. In this paper, we propose a novel approach that leverages the benefits of vector graphics for object localization and classification. Our method, called YOLaT (You Only Look at Text), takes the textual document of vector graphics as input, rather than rendering it into pixels. YOLaT builds multi-graphs to model the structural and spatial information in vector graphics and utilizes a dual-stream graph neural network (GNN) to detect objects from the graph. However, for real-world vector graphics, YOLaT only models in flat GNN with vertexes as nodes ignore higher-level information of vector data. Therefore, we propose YOLaT++ to learn Multi-level Abstraction Feature Learning from a new perspective: Primitive Shapes to Curves and Points. On the other hand, given few public datasets focus on vector graphics, data-driven learning cannot exert its full power on this format. We provide a large-scale and challenging dataset for Chart-based Vector Graphics Detection and Chart Understanding, termed VG-DCU, with vector graphics, raster graphics, annotations, and raw data drawn for creating these vector charts. Experiments show that the YOLaT series outperforms both vector graphics and raster graphics-based object detection methods on both subsets of VG-DCU in terms of both accuracy and efficiency, showcasing the potential of vector graphics for image recognition tasks.
Read full abstract