Distributed Control Independence for Composable Multi-processors

Mengjie Mao Mengjie Mao,Tao Sun Tao Sun,Hong An Hong An,Xuechao Wei Xuechao Wei,Bobin Deng Bobin Deng,Qi Li Qi Li,Junrui Zhou Junrui Zhou

doi:10.1109/icis.2012.45

Abstract

Composable Multi-processors employ large instruction windows and distributed layout, both of which amplify the branch misprediction penalty. Once branch misprediction is detected, hundreds or thousands of instructions may be in flight. Simple squashing all the instructions following the mispredicted branch turn to be a large waste. Branch misprediction becomes the key bottleneck in these systems. In this paper, we introduce Distributed Control Independence (DCI) to reduce branch misprediction bottleneck in a composable multi-processor, named TFlex. With control independence, branch misprediction penalty can be alleviated by saving the useful work of future control independent instructions. We found that only a small part of the saving instructions, whose data is depended on control dependent instructions, need re-executing. DCI achieves high hardware efficiency and performance scalability. Our experiment results show that DCI effectively mitigates the bottleneck of branch misprediction and speeds up baseline TFlex by a geometric mean of 35% when running diverse applications on 16-core TFlex configuration.

Full Text