The emergence of data-intensive applications, such as Deep Neural Networks (DNN), exacerbates the well-known memory bottleneck in computer systems and demands early attention in the design flow. Electronic System-Level (ESL) design using SystemC Transaction Level Modeling (TLM) enables effective performance estimation, design space exploration (DSE), and gradual refinement. However, memory contention is often only detectable after detailed TLM-2.0 approximately-timed or cycle-accurate RTL models are developed. A memory bottleneck detected at such a late stage can severely limit the available design choices or even require costly redesign. In this work, we propose a novel TLM-2.0 loosely-timed contention-aware (LT-CA) modeling style that offers high-speed simulation close to traditional loosely-timed (LT) models, yet shows the same accuracy for memory contention as low-level approximately-timed (AT) models. Thus, our proposed LT-CA modeling breaks the speed/accuracy tradeoff between regular LT and AT models and offers fast and accurate observation and visualization of memory contention. Our extensible SystemC model generator automatically produces desired TLM-1 and TLM-2.0 models from a DNN architecture description for design space exploration focusing on memory contention. We demonstrate our approach with a real-world industry-strength DNN application, GoogLeNet. The experimental results show that the proposed LT-CA modeling is 46× faster in simulation than equivalent AT models with an average error of less than 1% in simulated time. Early detection of memory contentions also suggests that local memories close to computing cores can eliminate memory contention in such applications.
Read full abstract