Abstract

Traditional widely used parallel programming models and methods focus on data distribution and are suitable for implementing data parallelism. They lack the abstraction of task parallelism and make it inconvenient to separate the applications' high level structure from low level implementation and execution. To improve this, we propose a parallel programming framework based on the tasks and conduits (TNC) model. In this framework, we provide tasks and conduits as the basic components to construct applications at a higher level. Users can easily implement coarse-grained task parallelism with multiple tasks running concurrently. When running on different platforms, the application main structure can stay the same and only adapt task implementations based on the target platforms, improving maintenance and portability of parallel programs. For a single task, we provide multiple levels of shared memory concepts, allowing users to implement fine grained data parallelism through groups of threads across multiple nodes. This provides users a flexible and efficient means to implement parallel applications. By extending the framework runtime system, it is able to launch and run GPU tasks to make use of GPUs for acceleration. The support of both CPU tasks and GPU tasks helps users develop and run parallel applications on heterogeneous platforms. To demonstrate the use of our framework, we tested it with some kernel applications. The results show that the applications' performance using our framework is comparable to traditional programming methods. Further, with the use of GPU tasks, we can easily adjust the program to leverage GPUs for acceleration. In our tests, a single GPU's performance is comparable to a 4 node multicore CPU cluster.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call