Virtual Reality Image Dataset vCAT Helps Research on Semantic Segmentation Algorithms

Wenjie Li,Yunxin Fan,Shugen Ma,Wenchuan Jia

doi:10.1109/icit48603.2022.10002818

Abstract

It is difficult to create sample datasets for semantic identification for animals with quick behavior traits, such as cats, and the cost of dataset gathering is greatly increased by high-speed motion capture cameras. This paper takes the common Chinese dragon-li as an example, carries on the solid modeling and the gait movement design in the digital 3D modeling software, then constructs the virtual image dataset vCAT through the image rendering technology. To account for changing elements like the body shape, and scene and camera perspective, the vCAT dataset has numerous data sub-categories. In addition, we accomplished the pixel-level segmentation test of synthetic images and actual scene images using deep learning on the vCAT dataset based on the attention mechanism. During the basis of training, the recognition accuracy rate can reach 84% when only 726 images with real scene modeling were used. In addition to providing a fully automated technical method for creating the original image of the virtual image dataset with its corresponding labels, we also thoroughly test and discuss elements like scenes, perspectives, textures, and lighting in the virtual dataset to serve as a foundation for its practical use in deep learning training.

Full Text