Memory Efficiency Research Articles

For autonomous driving, it is imperative to perform various high-computation image recognition tasks with high accuracy, utilizing diverse sensors to perceive the surrounding environment. Specifically, cameras are used to perform lane detection, object detection, and segmentation, and, in the absence of lidar, tasks extend to inferring 3D information through depth estimation, 3D object detection, 3D reconstruction, and SLAM. However, accurately processing all these image recognition operations in real-time for autonomous driving under constrained hardware conditions is practically unfeasible. In this study, considering the characteristics of image recognition tasks performed by these sensors and the given hardware conditions, we investigated MTL (multi-task learning), which enables parallel execution of various image recognition tasks to maximize their processing speed, accuracy, and memory efficiency. Particularly, this study analyzes the combinations of image recognition tasks for autonomous driving and proposes the MDO (multi-task decision and optimization) algorithm, consisting of three steps, as a means for optimization. In the initial step, a MTS (multi-task set) is selected to minimize overall latency while meeting minimum accuracy requirements. Subsequently, additional training of the shared backbone and individual subnets is conducted to enhance accuracy with the predefined MTS. Finally, both the shared backbone and each subnet undergo compression while maintaining the already secured accuracy and latency performance. The experimental results indicate that integrated accuracy performance is critically important in the configuration and optimization of MTL, and this integrated accuracy is determined by the ITC (inter-task correlation). The MDO algorithm was designed to consider these characteristics and construct multi-task sets with tasks that exhibit high ITC. Furthermore, the implementation of the proposed MDO algorithm, coupled with additional SSL (semi-supervised learning) based training, resulted in a significant performance enhancement. This advancement manifested as approximately a 12% increase in object detection mAP performance, a 15% improvement in lane detection accuracy, and a 27% reduction in latency, surpassing the results of previous three-task learning techniques like YOLOP and HybridNet.

Read full abstract

Program reduction is a highly practical, widely demanded technique to help debug language tools, such as compilers, interpreters and debuggers. Given a program P that exhibits a property ψ, conceptually, program reduction iteratively applies various program transformations to generate a vast number of variants from P by deleting certain tokens and returns the minimal variant preserving ψ as the result. A program reduction process inevitably generates duplicate variants, and the number of them can be significant. Our study reveals that on average 61.8% and 24.3% of the generated variants in two representative program reducers HDD and Perses, respectively, are duplicates. Checking them against ψ is thus redundant and unnecessary, which wastes time and computation resources. Although it seems that simply caching the generated variants can avoid redundant property tests, such a trivial method is impractical in the real world due to the significant memory footprint. Therefore, a memory-efficient caching scheme for program reduction is in great demand. This study is the first effort to conduct a systematic, extensive analysis of memory-efficient caching schemes for program reduction. We first propose to use two well-known compression methods, ZIP and SHA , to compress the generated variants before they are stored in the cache. Furthermore, our keen understanding on the program reduction process motivates us to propose a novel, domain-specific, both memory and computation-efficient caching scheme, R efreshable C ompact C aching ( RCC ). Our key insight is two-fold: ① by leveraging the correlation between variants and the original program P , we losslessly encode each variant into an equivalent , compact , canonical representation; ② periodically, stale cache entries, which will never be accessed, are timely removed to minimize the memory footprint over time. Our extensive evaluation on 31 real-world C compiler bugs demonstrates that caching schemes help avoid issuing redundant queries by 61.8% and 24.3% in HDD and Perses, respectively; correspondingly, the runtime performance is notably boosted by 22.8% and 18.2%. With regard to the memory efficiency, all three methods use less memory than the state-of-the-art string-based scheme STR . Specifically, ZIP and SHA cut down the memory footprint by more than 80% and 90% in both Perses and HDD compared to STR ; moreover, the highly-scalable, domain-specific RCC dominates peer schemes, and outperforms the SHA by 96.4% and 91.74% in HDD and Perses, respectively.

Read full abstract

Memory Efficiency Research Articles

Related Topics

Articles published on Memory Efficiency

Marker discovery in the large.

Hydrogen assisted cracking using an efficient virtual element scheme

LM-SRPQ: Efficiently Answering Regular Path Query in Streaming Graphs

GLULA: Linear attention-based model for efficient human activity recognition from wearable sensors.

Digit-Serial DA-Based Fixed-Point RNNs: A Unified Approach for Enhancing Architectural Efficiency.

A Non-parametric Bootstrap Method for Kinetic Monte Carlo Variance Reduction

Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay.

Validation of Smagorinsky LES turbulence model in FluidX3D LBM: In-place vs central difference

Innovative Dual-Decoupling CNN with Layer-wise Temporal-Spatial Attention for Sensor-Based Human Activity Recognition.

Sleep Disruption, Fatigue, and Altered Neurobehavioral Performance Among Flight Controllers During Spaceflight Operations

A 3D finite element spectral integral (FESI) method for acoustics

An efficient weighted partial MaxSAT encoding for scheduling in overloaded real-time systems

An Online Support Vector Machine Algorithm for Dynamic Social Network Monitoring

Optimal Configuration of Multi-Task Learning for Autonomous Driving.

Fast and Accurate Parasitic Extraction in Multichip Power Module Design Automation Considering Eddy-Current Losses

Enhancing Efficiency of the Fast Quantum Memory on Single-Atom in Cavity

Advanced Pattern-Mining System for Fake News Analysis

An empirical study of schema-associated mnemonic method for classical Chinese poetry in primary and secondary education based on cognitive schema migration theory

On the Caching Schemes to Speed Up Program Reduction

An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Memory Efficiency Research Articles

Related Topics

Articles published on Memory Efficiency

Marker discovery in the large.

Hydrogen assisted cracking using an efficient virtual element scheme

LM-SRPQ: Efficiently Answering Regular Path Query in Streaming Graphs

GLULA: Linear attention-based model for efficient human activity recognition from wearable sensors.

Digit-Serial DA-Based Fixed-Point RNNs: A Unified Approach for Enhancing Architectural Efficiency.

A Non-parametric Bootstrap Method for Kinetic Monte Carlo Variance Reduction

Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding Replay.

Validation of Smagorinsky LES turbulence model in FluidX3D LBM: In-place vs central difference

Innovative Dual-Decoupling CNN with Layer-wise Temporal-Spatial Attention for Sensor-Based Human Activity Recognition.

Sleep Disruption, Fatigue, and Altered Neurobehavioral Performance Among Flight Controllers During Spaceflight Operations

A 3D finite element spectral integral (FESI) method for acoustics

An efficient weighted partial MaxSAT encoding for scheduling in overloaded real-time systems

An Online Support Vector Machine Algorithm for Dynamic Social Network Monitoring

Optimal Configuration of Multi-Task Learning for Autonomous Driving.

Fast and Accurate Parasitic Extraction in Multichip Power Module Design Automation Considering Eddy-Current Losses

Enhancing Efficiency of the Fast Quantum Memory on Single-Atom in Cavity

Advanced Pattern-Mining System for Fake News Analysis

An empirical study of schema-associated mnemonic method for classical Chinese poetry in primary and secondary education based on cognitive schema migration theory

On the Caching Schemes to Speed Up Program Reduction

An Efficient and Light Transformer-Based Segmentation Network for Remote Sensing Images of Landscapes