Abstract

On-chip clock networks are remarkable in their impact on the performance and power of synchronous circuits, in their susceptibility to adverse effects of semiconductor technology scaling, as well as in their strong potential for improvement through better CAD algorithms and tools. Existing literature is rich in ideas and techniques but performs large-scale optimization using analytical models that lost accuracy at recent technology nodes and have rarely been validated by realistic SPICE simulations on large industry designs. Our work offers a methodology for SPICE-accurate optimization of clock networks, coordinated to satisfy slew constraints and achieve best tradeoffs between skew, insertion delay, power, as well as tolerance to variations. Our implementation, called Contango, is evaluated on 45 nm benchmarks from IBM Research and Texas Instruments with up to 50 K sinks. It outperforms all published results in terms of skew and shows superior scalability.

Highlights

  • Accurate distribution of clock signals is a major limiting factor for high-performance integrated circuits when unintended clock skew narrows down the useful portion of the clock cycle

  • bounded-skew tree (BST)-deferred merging and embedding (DME) algorithms [6] developed in the late 1990s reduced skew to single ps in fairly large circuits, while requiring only minutes of CPU time

  • When BST/DME algorithms were introduced in the early 1990s, many chip designs included one large central buffer to drive clock signals through the entire chip

Read more

Summary

INTRODUCTION

Accurate distribution of clock signals is a major limiting factor for high-performance integrated circuits when unintended clock skew narrows down the useful portion of the clock cycle. Intra-die variations may be stronger on some paths than on others, which would further increase effective skew These challenges have motivated research at the device, circuit and algorithm levels [17]. Our work focuses on clock-network synthesis for ASICs and SoCs, where clock frequencies are not as aggressive as in high-performance CPUs, but power is limited, especially for portable applications. In this context, tree topologies remain the most popular choice, potentially with further tuning and enhancements. A methodology for clock-tree optimizations that outperforms the best results at the ISPD‘09 contest on every benchmark by 2.15 − 3.99 times, while reducing skew to 2.2 − 4.6ps (Table V).

BACKGROUND
PROBLEM ANALYSIS
Nominal skew optimization
CLR optimization
Coordinating multiple optimizations
PROPOSED SOC CLOCK-SYNTHESIS METHODOLOGY
Obstacle-avoiding clock trees
Initial buffer insertion with sizing
Buffer sliding and interleaving
Iterative buffer sizing
Iterative top-down wiresizing
Iterative top-down wiresnaking
EMPIRICAL VALIDATION
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.