Abstract

Fine-grain thread parallelism using task based programming models are a new trend in achieving massively parallel computations. Often, software pre-fetching and queuing mechanisms for managing these dynamic environments are inadequate, failing to keep the processor cores busy with computation. At the same time, the CPU-memory performance gap is getting worse and this puts a strain on memory subsystem to keep cores in a busy state. We describe a hardware based pre-fetching and queuing mechanism aimed at assisting the over-subscription of very lightweight threads per core. Experiments with a soft processor and a reconfigurable accelerator core are reported. The hardware demonstrates the ability to block on out-of-order memory transactions and alleviates the software bottleneck.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.