Inter-thread Communication Research Articles

Computer Technology has Revolutionized Science. This has motivated scientists to develop mathematical model to simulate salient features of Physical universe. These models can approximate reality at many levels of scale such as atomic nucleus, Earth's biosphere & weather/climate assessment. If the computer power is greater, the greater will be the accuracy in approximation i.e. close will be the approximation to the reality. The speed of the computer required for solution of such problems require computers with processing power of teraflops to Pets flops speed.. The way to speed up the computation is to "parallelize" it. One of the approach is to use multimillion dollar Supercomputer or use Computational Grid (which is also called poor man's supercomputer) having geographically distributed resources e.g. SETI@home (Used to detect radio waves emitted by intelligent civilizations outside earth) has 4.6 million participants computers. There are many alternatives tools available to achieve this goal like Globus Toolkit, Entropia, Legion, BOINC etc but they are mainly based on Linux platform. As majority of the computers available are windows based, so it will be easy to develop a larger network of computers which will use the free cycles of the computer to solve the complex problem at window platform. Nimble@ITCEcnoGrid has been developed. It includes the feature of Inter Thread Communication which is missing in any of the toolkits available. Nimble@ITCEcnoGrid Framework (A Fast Grid with Inter-thread communication with Economic Based Policy) was tested for computation of 'PI' up to 120 decimal points. Encouraged by the speed the same system has been utilized to computes the Momentum, Thermodynamics and Continuity equations for the Weather Forecasting using the Windows based Desktop computers.

Read full abstract

This paper studies how to parallelize the emerging media mining workloads on existing small-scale multi-core processors and future large-scale platforms. Media mining is an emerging technology to extract meaningful knowledge from large amounts of multimedia data, aiming at helping end users search, browse, and manage multimedia data. Many of the media mining applications are very complicated and require a huge amount of computing power. The advent of multi-core architectures provides the acceleration opportunity for media mining. However, to efficiently utilize the multi-core processors, we must effectively execute many threads at the same time. In this paper, we present how to explore the multi-core processors to speed up the computation-intensive media mining applications. We first parallelize two media mining applications by extracting the coarse-grained parallelism and evaluate their parallel speedups on a small-scale multi-core system. Our experiment shows that the coarse-grained parallelization achieves good scaling performance, but not perfect. When examining the memory requirements, we find that these coarse-grained parallelized workloads expose high memory demand. Their working set sizes increase almost linearly with the degree of parallelism, and the instantaneous memory bandwidth usage prevents them from perfect scalability on the 8-core machine. To avoid the memory bandwidth bottleneck, we turn to exploit the fine-grained parallelism and evaluate the parallel performance on the 8-core machine and a simulated 64-core processor. Experimental data show that the fine-grained parallelization demonstrates much lower memory requirements than the coarse-grained one, but exhibits significant read-write data sharing behavior. Therefore, the expensive inter-thread communication limits the parallel speedup on the 8-core machine, while excellent speedup is observed on the large-scale processor as fast core-to-core communication is provided via a shared cache. Our study suggests that (1) extracting the coarse-grained parallelism scales well on small-scale platforms, but poorly on large-scale system; (2) exploiting the fine-grained parallelism is suitable to realize the power of large-scale platforms; (3) future many-core chips can provide shared cache and sufficient on-chip interconnect bandwidth to enable efficient inter-core communication for applications with significant amounts of shared data. In short, this work demonstrates proper parallelization techniques are critical to the performance of multi-core processors. We also demonstrate that one of the important factors in parallelization is the performance analysis. The parallelization principles, practice, and performance analysis methodology presented in this paper are also useful for everyone to exploit the thread-level parallelism in their applications.

Read full abstract

Inter-thread Communication Research Articles

Related Topics

Articles published on Inter-thread Communication

CODE TRANSFORMATIONS FOR ENHANCING THE PERFORMANCE OF SPECULATIVELY PARALLEL THREADS

Aikido

Aikido

Parallel density matrix propagation in spin dynamics simulations

Nimble@ITCEcnoGrid: A Grid in Research Domain for Weather Forecasting

DeFT

Isolating and understanding concurrency errors using reconstructed execution fragments

Multithreaded Simulation for Synchronous Dataflow Graphs

CRITICAL-PATH DRIVEN ROUTERS FOR ON-CHIP NETWORKS

Composable specifications for structured shared-memory communication

Parallel generation of multiple L-systems

CoreDet

CoreDet

Leakage-saving opportunities in mesh-based massive multi-core architectures

DMP

DMP

Parallelization Strategies and Performance Analysis of Media Mining Applications on Multi-Core Processors

Performance scalability of decoupled software pipelining

Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor

Physical simulation for animation and visual effects

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Inter-thread Communication Research Articles

Related Topics

Articles published on Inter-thread Communication

CODE TRANSFORMATIONS FOR ENHANCING THE PERFORMANCE OF SPECULATIVELY PARALLEL THREADS

Aikido

Aikido

Parallel density matrix propagation in spin dynamics simulations

Nimble@ITCEcnoGrid: A Grid in Research Domain for Weather Forecasting

DeFT

Isolating and understanding concurrency errors using reconstructed execution fragments

Multithreaded Simulation for Synchronous Dataflow Graphs

CRITICAL-PATH DRIVEN ROUTERS FOR ON-CHIP NETWORKS

Composable specifications for structured shared-memory communication

Parallel generation of multiple L-systems

CoreDet

CoreDet

Leakage-saving opportunities in mesh-based massive multi-core architectures

DMP

DMP

Parallelization Strategies and Performance Analysis of Media Mining Applications on Multi-Core Processors

Performance scalability of decoupled software pipelining

Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor

Physical simulation for animation and visual effects