Abstract
Fine-grain thread parallelism using task based programming models are a new trend in achieving massively parallel computations. Often, software pre-fetching and queuing mechanisms for managing these dynamic environments are inadequate, failing to keep the processor cores busy with computation. At the same time, the CPU-memory performance gap is getting worse and this puts a strain on memory subsystem to keep cores in a busy state. We describe a hardware based pre-fetching and queuing mechanism aimed at assisting the over-subscription of very lightweight threads per core. Experiments with a soft processor and a reconfigurable accelerator core are reported. The hardware demonstrates the ability to block on out-of-order memory transactions and alleviates the software bottleneck.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.