Abstract

One of the major focuses in the remote sensing community is the rapid processing of deep neural networks (DNNs) on very high resolution (VHR) aerial images. Few studies have investigated the acceleration of training and prediction by optimizing the architecture of the DNN system rather than designing a lightweight DNN. Parallel processing using multiple graphics processing units (GPUs) increases VHR image processing performance. It drives extremely large and frequent data transfers (input/output(I/O)) from random access memory (RAM) to GPU memory. As a result, the system bus congestion causes the system to hang, resulting in long latency in training/predicting. In this paper, we evaluate the causes of long latency and propose a space-to-speed (S2S) DNN system to overcome the aforementioned challenges. A three-level memory system aiming to reduce data transfer during system operation is presented. Distribution optimization with parallel processing was used to accelerate the training. Training optimizations on VHR images (such as hot-zone searching and image/ground truth queues for data saving) were used to train the VHR images efficiently. Inference optimization was performed to speed up prediction in the release mode. To verify the efficiency of the proposed system, we used aerial image labeling from the Institut National de Recherche en Informatique et en Automatique (INRIA) and benchmarks from the Massachusetts Institute of Technology Aerial Imagery for Roof Segmentation (MITAIRS) to test the system performance and accuracy. Without the loss of accuracy, the S2S system improved prediction speed on the testing dataset by eight GPUs in a normal setting in both the INRIA dataset (from 534 to 72 s) and the MITAIRS dataset (818 to 120 s). With the prediction in half-float (using float-16 data), an 8-GPU parallel processing increased the speed to 38 s in the INRIA dataset and 83 s in the MITAIRS dataset. In a pressure test, our proposed system operated on 18,000 images with a size of 5000 × 5000 from 18.2 to 1.8 h with the prediction in full-float (using float-32 data) and 43 min with the prediction in half-float, increasing the speed by a factor of 9.78 and 25.3, respectively, when compared to system runs without optimization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.