Abstract

Recently, a new deep learning architecture, the Vision Transformer, has emerged as the new standard for classification tasks, overtaking the conventional Convolutional Neural Network (CNN) models. However, these state-of-the-art models require large amounts of data, typically over 100 million images, to achieve optimal performance through transfer learning. This requirement is met by using proprietary datasets like JFT-300M or 3B, which are not publicly available. To overcome these challenges and address privacy concerns, Formula-Driven Supervised Learning (FDSL) has been introduced. FDSL trains deep learning models using synthetic images generated from mathematical formulas, such as Fractals and Radial Contour images. The main objective of this approach is to reduce the I/O bottleneck that occurs during training with large datasets. Our implementation of FDSL generates instances in real-time during training, and uses a custom data loader based on EGL (Native Platform Graphics Interface) for fast rendering via shaders. The evaluation of our custom data loader on the FractalDB-100k dataset comprising 100 million images revealed a loading time that is three times faster compared to the PyTorch Vision loader.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call