The clock distribution network in a synchronous digital circuit delivers a clock signal to every storage element, that is, clock sink in the circuit. However, since the continued technology scaling increases PVT (process-voltage-temperature) variation, the increase of clock-skew variation is highly likely to cause performance degradation or system failure at runtime. Recently, to mitigate the clock-skew variation, many researchers have taken a profound interest in the clock mesh network. However, though the structure of the clock mesh network is excellent in tolerating timing variations, it demands significantly high power consumption due to the use of excessive mesh wire and buffer resources. Thus, optimizing the resources required in the mesh clock synthesis while maintaining the variation tolerance is crucially important. The three major tasks that greatly affect the cost of the resulting clock mesh are: (1) mesh segment allocation , (2) mesh buffer allocation and sizing , and (3) clock sink binding to mesh segments . Previous clock mesh optimization approaches solve the three tasks sequentially, one by one at a time, to manage the runtime complexity of the tasks at the expense of losing the quality of results. However, since the three tasks are tightly interrelated, simultaneously optimizing all three tasks is essential, if the runtime is ever permitted, to synthesize an economical clock mesh network. In this work, we propose an approach that is able to tackle the problem in an integrated fashion by combining the three tasks into an iterative framework of incremental updates and solving them simultaneously to find a globally optimal allocation of mesh resources while taking into account the clock-skew tolerance constraints. The core parts of this work are a precise analysis on the relation among the resource optimization tasks and an establishment of a mechanism for effective and efficient integration of the tasks. In particular, to handle the runtime problem, we propose a set of speedup techniques, that is, modeling the RC circuit for eliminating redundant matrix multiplications, exploiting a sliding-window scheme, and quickly estimating the buffer sizing effect, which are fitted into our context of fast clock-skew estimation in mesh resource optimization as well as an invention of early decision policies. Through extensive experiments with benchmark circuits, it is shown that our proposed clock mesh synthesizer is able to reduce the worst-case clock skew, total mesh wirelength, total size of mesh driving buffers, and total clock mesh power consumption including short-circuit power by 25.0%, 13.2%, 10.9%, and 11.0% on average compared to that produced by the best-known clock mesh synthesis method (MeshWorks), respectively.
Read full abstract