Abstract

Composable Multi-processors employ large instruction windows and distributed layout, both of which amplify the branch misprediction penalty. Once branch misprediction is detected, hundreds or thousands of instructions may be in flight. Simple squashing all the instructions following the mispredicted branch turn to be a large waste. Branch misprediction becomes the key bottleneck in these systems. In this paper, we introduce Distributed Control Independence (DCI) to reduce branch misprediction bottleneck in a composable multi-processor, named TFlex. With control independence, branch misprediction penalty can be alleviated by saving the useful work of future control independent instructions. We found that only a small part of the saving instructions, whose data is depended on control dependent instructions, need re-executing. DCI achieves high hardware efficiency and performance scalability. Our experiment results show that DCI effectively mitigates the bottleneck of branch misprediction and speeds up baseline TFlex by a geometric mean of 35% when running diverse applications on 16-core TFlex configuration.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.