Scalable Parallel Architecture Research Articles

Clustering is the most common method for organizing unlabeled data into its natural groups (called clusters), based on similarity (in some sense or another) among data objects. The Partitioning Around Medoids (PAM) algorithm belongs to the partitioning-based methods of clustering widely used for objects categorization, image analysis, bioinformatics and data compression, but due to its high time complexity, the PAM algorithm cannot be used with large datasets or in any embedded or real-time application. In this work, we propose a simple and scalable parallel architecture for the PAM algorithm to reduce its running time. This architecture can easily be implemented either on a multi-core processor system to deal with big data or on a reconfigurable hardware platform, such as FPGA and MPSoCs, which makes it suitable for real-time clustering applications. Our proposed model partitions data equally among multiple processing cores. Each core executes the same sequence of tasks simultaneously on its respective data subset and shares intermediate results with other cores to produce results. Experiments show that the computational complexity of the PAM algorithm is reduced exponentially as we increase the number of cores working in parallel. It is also observed that the speedup graph of our proposed model becomes more linear with the increase in number of data points and as the clusters become more uniform. The results also demonstrate that the proposed architecture produces the same results as the actual PAM algorithm, but with reduced computational complexity.

Read full abstract

Approximation of discrete cosine transform (DCT) is useful for reducing its computational complexity without significant impact on its coding performance. Most of the existing algorithms for approximation of the DCT target only the DCT of small transform lengths, and some of them are non-orthogonal. This paper presents a generalized recursive algorithm to obtain orthogonal approximation of DCT where an approximate DCT of length $N$ could be derived from a pair of DCTs of length $(N/2)$ at the cost of $N$ additions for input preprocessing. We perform recursive sparse matrix decomposition and make use of the symmetries of DCT basis vectors for deriving the proposed approximation algorithm. Proposed algorithm is highly scalable for hardware as well as software implementation of DCT of higher lengths, and it can make use of the existing approximation of 8-point DCT to obtain approximate DCT of any power of two length, $N>8$ . We demonstrate that the proposed approximation of DCT provides comparable or better image and video compression performance than the existing approximation methods. It is shown that proposed algorithm involves lower arithmetic complexity compared with the other existing approximation algorithms. We have presented a fully scalable reconfigurable parallel architecture for the computation of approximate DCT based on the proposed algorithm. One uniquely interesting feature of the proposed design is that it could be configured for the computation of a 32-point DCT or for parallel computation of two 16-point DCTs or four 8-point DCTs with a marginal control overhead. The proposed architecture is found to offer many advantages in terms of hardware complexity, regularity and modularity. Experimental results obtained from FPGA implementation show the advantage of the proposed method.

Read full abstract

Scalable Parallel Architecture Research Articles

Related Topics

Articles published on Scalable Parallel Architecture

A Parallel Architecture for the Partitioning Around Medoids (PAM) Algorithm for Scalable Multi-Core Processor Implementation with Applications in Healthcare.

SNAVA—A real-time multi-FPGA multi-model spiking neural network simulation architecture

A Scalable Parallel Architecture Based on Many-Core Processors for Generating HTTP Traffic

A Generalized Algorithm and Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT

Biswapped networks: a family of interconnection architectures with advantages over swapped or OTIS networks

New Methodologies for Parallel Architecture

A Parallel Efficient Architecture for Large Cryptographically Robust n × k (k>n/2) Mappings

FPGA-Based Multiple-Channel Vibration Analyzer for Industrial Applications in Induction Motor Failure Detection

Hardware Acceleration of HMMER on FPGAs

Swapped interconnection networks: Topological, performance, and robustness attributes

HPF+: High Performance Fortran for advanced scientific and engineering applications

VFC: The Vienna Fortran Compiler

Evaluation of Cluster-based System for the OLTP Application

Incorporating crystallographic texture in deformation process simulations

An Architecture for High Availability Multi-user Systems

DESIGN AND SIMULATION OF THE ON-LINE TRIGGER AND RECONSTRUCTION FARM FOR THE HERA-B EXPERIMENT

A VLSI PARALLEL ARCHITECTURE FOR FUZZY EXPERT SYSTEMS

Performance bounds for column-block partitioning of parallel Gaussian elimination and Gauss-Jordan methods

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Scalable Parallel Architecture Research Articles

Related Topics

Articles published on Scalable Parallel Architecture

A Parallel Architecture for the Partitioning Around Medoids (PAM) Algorithm for Scalable Multi-Core Processor Implementation with Applications in Healthcare.

SNAVA—A real-time multi-FPGA multi-model spiking neural network simulation architecture

A Scalable Parallel Architecture Based on Many-Core Processors for Generating HTTP Traffic

A Generalized Algorithm and Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT

Biswapped networks: a family of interconnection architectures with advantages over swapped or OTIS networks

New Methodologies for Parallel Architecture

A Parallel Efficient Architecture for Large Cryptographically Robust n × k (k&gt;n/2) Mappings

FPGA-Based Multiple-Channel Vibration Analyzer for Industrial Applications in Induction Motor Failure Detection

Hardware Acceleration of HMMER on FPGAs

Swapped interconnection networks: Topological, performance, and robustness attributes

HPF+: High Performance Fortran for advanced scientific and engineering applications

VFC: The Vienna Fortran Compiler

Evaluation of Cluster-based System for the OLTP Application

Incorporating crystallographic texture in deformation process simulations

An Architecture for High Availability Multi-user Systems

DESIGN AND SIMULATION OF THE ON-LINE TRIGGER AND RECONSTRUCTION FARM FOR THE HERA-B EXPERIMENT

A VLSI PARALLEL ARCHITECTURE FOR FUZZY EXPERT SYSTEMS

Performance bounds for column-block partitioning of parallel Gaussian elimination and Gauss-Jordan methods

A Parallel Efficient Architecture for Large Cryptographically Robust n × k (k>n/2) Mappings