Abstract

The existing deep learning compilers are unable to perform efficient hardware performance-related graph fusion when both time and power consumption are considered. In addition, the compilers optimize the computational graph of deep neural networks (DNNs) by performing static graph transformation based on the greedy algorithm, only considering the runtime performance, and ignoring the cost of the tuning process. To solve these problems, this paper proposes a DNN computational graph optimization compiler (PCGC). Through the performance feedback at runtime, PCGC designs a computational graph fusion and splitting optimization strategy based on multilevel operator layer fusion-splitting rules. First, PCGC uses a rule-guided graph segmentation algorithm to recursively segment the computational graph into smaller subgraph to achieve an efficient and detailed search. Then, PCGC uses the cost model to receive feedback from hardware performance information, proposes cost model and operator fusion rules to synthetically guide the partial fusion and partitioning of nodes and edges of the computational graph, and generates optimal subgraphs flexibly according to different hardware to optimize the search space for partial fusion. Finally, we make the cost model converge quickly to the loss value we set by manually adjusting the parameters. Compared with other advanced compilers, PCGC optimizes the overall power consumption on an embedded GPU by an average of 130.5% when the time consumption on each hardware is not lower than the average time consumption. On domain-specific architecture, PCGC optimizes power consumption by an average of 66.5%. On FPGA, PCGC optimizes power consumption by 66.1%. In a sense, PCGC can achieve high-speed inference in specific power supply scenarios, reducing the carbon emissions of edge computing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call