Abstract

The Internet applications, such as network searching, electronic commerce, and modern medical applications, produce and process massive data. Considerable data parallelism exists in computation processes of data-intensive applications. A traversal algorithm, breadth-first search (BFS), is fundamental in many graph processing applications and metrics when a graph grows in scale. A variety of scientific programming methods have been proposed for accelerating and parallelizing BFS because of the poor temporal and spatial locality caused by inherent irregular memory access patterns. However, new parallel hardware could provide better improvement for scientific methods. To address small-world graph problems, we propose a scalable and novel field-programmable gate array-based heterogeneous multicore system for scientific programming. The core is multithread for streaming processing. And the communication network InfiniBand is adopted for scalability. We design a binary search algorithm to address mapping to unify all processor addresses. Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with the prior work under the same degree of parallelism. Our system is efficient not as a special acceleration unit but as a processor platform that deals with graph searching applications.

Highlights

  • Information technology, the Internet, and intelligent technology have ushered in the era of big data

  • Many real-world applications could be abstracted as a large graph of millions of vertices, but this procedure is a considerable challenge for processing

  • The pipeline was divided into Thread Select (TS), Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), and Write Back (WB) sections

Read more

Summary

Introduction

Information technology, the Internet, and intelligent technology have ushered in the era of big data. Many real-world applications could be abstracted as a large graph of millions of vertices, but this procedure is a considerable challenge for processing. These applications represent the connections, relations, and interaction among entities, such as social networks [2], biological interactions [3], and ground transportation [1]. Poor data-driven computation, unstructured organization, irregular memory access, and low computations-to-memory ratio are the prime reasons for parallel large-graph processing inefficiency [4]. Chen et al [6] proposed a new parallel model called Codelet model They all do a good job in speeding up access to memory.

Related Works
Massive Parallel Coprocessor System Architecture
Binary Search Address Mapping Unit
Architecture of Streaming Processor
A Bank B memory
Three-Level Memory Hierarchy
Results and Comparison
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call