RTL Evaluation of l2-Norm Approximation with Rotated Ιχ-Norm for 2-Tuple Arrays
This paper introduces a hardware-friendly, high-precision approximation method for the ℓ2-norm of 2-tuple data arrays using rotated ℓi-norm evaluation with fixed-point arithmetic, demonstrating improved performance in FPGA-based image restoration with favorable quality, circuit area, latency, and throughput compared to traditional approaches.
This study proposes a high-precision fast approximation method for the ℓ2-norm evaluation of 2-tuple data arrays using a rotated ℓi-norm evaluation with fixed-point arithmetic. In several signal processing applications, such as image restoration with isotropic total variation (TV) and one with complex ℓ1-norm regularization, a large number of calculations for the 2-tuple ℓ2-norm are frequently required. To achieve a hardware (HW)-friendly calculation, the square and square root operations involved in the ℓ2-norm calculation should be adequately approximated. However, several existing techniques have been challenged with respect to approximations. Thus, in this paper, a HW-friendly approximation algorithm is proposed. The proposed method uses the fact that the upper bound of the surface of a first-order rotational cone traces a second-order cone, that is, the ℓ2-cone. As a result, less variable multiplication is required, and parallel implementation is easily achieved using fixed-point arithmetic. To demonstrate the effectiveness of the proposed method, it was applied to image restoration, and then its performance on field programmable gate arrays (FPGA) is evaluated in terms of the quality, circuit area, latency, and throughput. The effectiveness of the proposed method is verified by comparing it with typical implementations using commercial circuits.
- Research Article
80
- 10.1016/j.camwa.2017.05.004
- Jun 9, 2017
- Computers & Mathematics with Applications
Isotropic and anisotropic total variation regularization in electrical impedance tomography
- Conference Article
- 10.1117/12.2568954
- Aug 21, 2020
Denoising has numerous applications in communications, control, machine learning, and many other fields of engineering and science. Total variation (TV) regularization is a widely used technique for signal and image restoration. There are two types of TV regularization problem: anisotropic TV and isotropic TV. One of the key difficulties in the TV-based image denoising problem is the nonsmoothness of the TV norms. There are known exact solutions methods for 1D TV regularization problem. Strong and Chan derived exact solutions to TV regularization problem for onedimensional case. They obtained the exact solutions when the original noise-free function, noise and the regularization parameter are subject to special constraints. Davies and Kovac considered the problem as non-parametric regression with emphasis on controlling the number of local extrema, and in particular consider the run and taut string methods. Condat proposed a direct fast algorithm for searching the exact solutions to the one-dimensional TV regularization problem for discrete functions. In the 2D case, some methods are used to approximate exact solutions to the TV regularization problem. In this presentation, we propose a new approximation method for 2D TV regularization problem based on the fast exact 1D TV approach. Computer simulation results of are presented to illustrate the performance of the proposed algorithm for the image restoration.
- Research Article
5
- 10.6100/ir582596
- Nov 18, 2015
- Data Archiving and Networked Services (DANS)
The narrowing opportunity window and the dramatically increasing development costs of deep sub-micron application specific integrated circuit (ASIC) designs have presented new challenges to the development process. The cost of ASICs development and fabrication is presently so high that more and more companies are seeking alternative implementation platforms. Today, programmable logic devices (PLDs), in particular the family known as lookup- table (LUT) based field programmable gate arrays (FPGAs), provide this alternative platform. Being introduced in 1984, LUT based FPGAs quickly evolved into sophisticated re-configurable system-on-a-chip (SoC) platforms that contained huge arrays of configurable logic blocks and interconnections for random logic implementation, efficient memory blocks, and embedded processor or intellectual property (IP) cores. FPGAs and FPGA-based re-configurable SoC platforms are from-the-shelf components produced in huge quantities. This not only reduces their cost, but also increases their quality. On the other hand, the complexity of the contemporary designs requires new efficient and effective synthesis methods and tools for taking the designs from register transfer level (RTL) to silicon. The research reported in this thesis is aimed at development of an effective and efficient method and electronic design automation (EDA) tool for the circuit synthesis targeting LUT based FPGAs and FPGA-based SoC platforms. The method is assumed to be based on the information-driven approach to circuit synthesis, general functional decomposition and information relationships measures. Despite the fact that circuit synthesis methods for FPGAs have been a standard research topic during the last two decades, the topic as it is addressed here remains crucial for at least two key reasons. First, earlier approaches tried to incorporate traditional synthesis methods based on some minimally functionally complete systems of logic gates implementing only few very specific functions (e.g., AND+OR+NOT). Such approaches require a post synthesis technology mapping for other implementation structures. If the actual synthesis target strongly differs from a given minimal system, as in FPGAs case, any technology mapping cannot guarantee a good result because the initial synthesis is performed without close relation to the actual target. Second, the new methods, based on functional decomposition, are computationally complex. Because of this computational complexity, it was important to develop adequate approaches and heuristics to make the synthesis methods effective and efficient. To make the heuristics robust, an adequate design decision-making apparatus that controls the decomposition process is vital. In the late nineties, a very promising design decision-making apparatus was proposed by J´o´zwiak [J´o´z97a].The apparatus was based on information flow modeling in discrete finite functions and relations and analysis of information importance in the modeled flows and information relationships between the flows. The early applications of that apparatus confirmed its potential as a successful measurement apparatus for controlling the heuristic optimization algorithms in the logic synthesis area. The research reported in this thesis is a continuation of that research. The primary goal of the proposed circuit synthesis process is to find the best trade-off between the circuit area and speed represented by the number of look-up-tables and the number of levels in the resulting network that implements a given multiple-output incompletely specified Boolean function. A secondary goal is minimization of the number and length of interconnections. The method developed in this work is unique in many ways. It uses the (enhanced) compositional bottom-up approach to the functional decomposition. Other methods use unordered or top-down reduction approach. In the proposed method, all crucial decisions are made with the use of the theory of information relationships and information relationship measures [J´o´z97a] during the decomposition process. The most important of these are: the predecessor sub-function input support construction and selection, and the binary encoding of the multiple-valued sub-functions. Other functional decomposition methods mainly use some structural properties of the resulting networks for decision-making. Sometimes, even the most crucial decisions are made randomly. A novel, exact, and heuristic methods for symmetry detection in the incompletely specified Boolean functions were proposed and implemented in the scope of the presented research. The proposed symmetry detection method is based on information modeling using set-systems. The adequate usage of symmetries of Boolean functions results in a significant simplification of the resulting network, but also expedites the decomposition process. The circuit synthesis method developed in the scope of this research was implemented in the form of an electronic design automation (EDA) software tool called IRMA2FPGAS (Information Relationship Measures Applied to FPGA Synthesis). Using this tool and several benchmarks, including MCNC benchmarks used in academia, an extensive experimental study was performed. The experimental results prove the capability of the information-driven functional decomposition approach and the proposed method to robustly construct high quality solutions for the FPGA-related circuit synthesis problems. Our IRMA2FPGAS tool significantly outperformed all other tools used in the experiments for both the circuit speed and area.
- Conference Article
27
- 10.1109/sips.2005.1579941
- May 11, 2010
Field Programmable Gate Arrays (FPGAs) are now considered as a real alternative for Digital Signal Process-ing (DSP) applications. But, new methodologies are still needed to automatically map a DSP application into an FPGA with respect to design constraints such as area, power consumption, execution time and time-to-market. Moreover DSP applications are frequently specified using floating-point arithmetic whereas fixed-point arithmetic should be used on FPGA. In this paper, a high-level synthesis methodology under constraints is presented. The originality is to consider a computation accuracy constraint. The methodology is based on a fixed-point operator library which characterizes the operators cost according to their wordlength. An error noise propagation model is used to compute an analytical expression of the accuracy in function of the signals wordlength. To obtain an efficient hardware implementation, the data wordlength optimization process is coupled with the high-level synthesis. In addition, the accuracy evaluation is done through an analytical method, which drastically reduces the optimization time.
- Research Article
3
- 10.1016/j.cag.2013.10.019
- Nov 5, 2013
- Computers & Graphics
Efficient schemes for joint isotropic and anisotropic total variation minimization for deblurring images corrupted by impulsive noise
- Research Article
1
- 10.4236/cs.2016.711323
- Jan 1, 2016
- Circuits and Systems
An open source high level synthesis fixed-to-floating and floating-to-fixed conversion tool is presented for embedded design, communication systems, and signal processing applications. Many systems use a fixed point number system. Fixed point numbers often need to be converted to floating point numbers for higher accuracy, dynamic range, fixed-length transmission limitations or end user requirements. A similar conversion system is needed to convert floating point numbers to fixed point numbers due to the advantages that fixed point numbers offer when compared with floating point number systems, such as compact hardware, reduced verification time and design effort. The latest embedded and SoC designs use both number systems together to improve accuracy or reduce required hardware in the same design. The proposed open source design and verification tool converts fixed point numbers to floating point numbers, and floating point numbers to fixed point numbers using the IEEE-754 floating point number standard. This open source design tool generates HDL code and its test bench that can be implemented in FPGA and VLSI systems. The design can be compiled and simulated using open source Iverilog/GTKWave and verified using Octave. A high level synthesis tool and GUI are designed using C#. The proposed design tool can increase productivity by reducing the design and verification time, as well as reduce the development cost due to the open source nature of the design tool. The proposed design tool can be used as a standalone block generator or implemented into current designs to improve range, accuracy, and reduce the development cost. The generated design has been implemented on Xilinx FPGAs.
- Research Article
4
- 10.5897/ijps11.424
- May 18, 2011
- International Journal of the Physical Sciences
The key issue of applying Turbo codes is to find an efficient implementation of turbo decoder. This paper addresses the implementation of a simplified and efficient turbo decoder in field programmable gate array (FPGA) technology. A simplified and efficient implementation of a Turbo decoder with minor performance loss has been proposed. An integer Turbo decoder based on the standard 2’s complement number system after considering the issues of dynamic range, truncation effect and other algorithm related subjects has been introduced. The efficient implementation comes from algorithm modification, integer arithmetic and compact hardware management. Based on the Max-Log-MAP decoding algorithm, the branch metric is modified by weighting a priori value, resulting in a significant BER improvement. The Turbo decoder takes in 8-level integer inputs generates 7-bit soft-decisions and calculates all metrics on integers, avoiding complex floating point or fixed-point arithmetic. By manipulating memory address, delay associated with interleaving and de-interleaving is eliminated, resulting in much higher throughput. Also, by taking advantage of identical decoder function, Turbo decoder is implemented in a single-decoder structure, making efficient use of memory and logic cells. Key words: Turbo, Max-Log-MAP, field programmable gate array, bit error ratio.
- Conference Article
4
- 10.1109/cdc40024.2019.9029910
- Dec 1, 2019
We present a method for determining the smallest precision required to have algorithmic stability of an implementation of the Fast Gradient Method (FGM) when solving a linear Model Predictive Control (MPC) problem in fixed-point arithmetic. We derive two models for the round-off error present in fixed-point arithmetic. The first is a generic model with no assumptions on the predicted system or weight matrices. The second is a parametric model that exploits the Toeplitz structure of the MPC problem for a Schur-stable system. We also propose a metric for measuring the amount of round-off error the FGM iteration can tolerate before becoming unstable. This metric is combined with the round-off error models to compute the minimum number of fractional bits needed for the fixed-point data type. Using these models, we show that exploiting the MPC problem structure nearly halves the number of fractional bits needed to implement an example problem. We show that this results in significant decreases in resource usage, computational energy and execution time for an implementation on a Field Programmable Gate Array.
- Research Article
- 10.3844/jcssp.2011.1894.1899
- Dec 1, 2011
- Journal of Computer Science
<strong>Problem statement:</strong> Parallel array multipliers are required to achieve high execution speed for Digital Signal Processing (DSP) applications. <strong>Approach:</strong> The purpose of this article is to investigate Field Programmable Gate Arrays (FPGAs) implementation of standard Braun&rsquo;s multipliers on Spartan-3AN, Virtex-2, Virtex-4 and Virtex-5 FPGAs using Very high speed integrated circuit Hardware Description Language (VHDL). The delay study was analyzed using Analysis Of Variance (ANOVA) method using the software Statistical Package for Social Science (SPSS) with a 0.05 confidence level was used to compare the FPGA devices. <b>Results:</b> The FPGA resource utilization by Virtex-5 is the lowest in value for 4&times;4, 6&times;6, 8&times;8 and 12&times;12-bit Braun&rsquo;s multipliers as compared to Spartan-3AN, Virtex-2 and Virtex-4 FPGAs. The average connection delays in Virtex-2 shows consistency and gradual increase in value as the size of multiplier increased. Virtex-2 FPGA demonstrates lower average connection delays as compared to Spartan-3AN, Virtex-4 and Virtex-5 FPGAs. For the maximum pin delay same observations are obtained for Virtex-2 FPGA. The anomalies in maximum pin delay and average connection delay are observed in Virtex-5, Virtex-4 and Spartan-3AN FPGAs. FPGA devices also demonstrate that as the size of multipliers increases their mean latency value is also increases. <b>Conclusion:</b> The FPGA resource utilization by Virtex-5 is the lowest in value for 4&times;4, 6&times;6, 8&times;8 and 12&times;12-bit Braun&rsquo;s multipliers as compared to Spartan-3AN, Virtex-2 and Virtex-4 FPGAs. Even value obtained for Virtex-5 FPGA for 4&times;4 bit standard Braun&rsquo;s multiplier for number of occupied slices and look up tables are lower in value than reported in literature.
- Research Article
53
- 10.1137/140969993
- Jan 1, 2014
- SIAM Journal on Imaging Sciences
In many image and signal processing applications, as interferometric synthetic aperture radar (SAR), electroencephalogram (EEG) data analysis or color image restoration in HSV or LCh spaces the data has its range on the one-dimensional sphere $\mathbb S^1$. Although the minimization of total variation (TV) regularized functionals is among the most popular methods for edge-preserving image restoration such methods were only very recently applied to cyclic structures. However, as for Euclidean data, TV regularized variational methods suffer from the so called staircasing effect. This effect can be avoided by involving higher order derivatives into the functional. This is the first paper which uses higher order differences of cyclic data in regularization terms of energy functionals for image restoration. We introduce absolute higher order differences for $\mathbb S^1$-valued data in a sound way which is independent of the chosen representation system on the circle. Our absolute cyclic first order difference is just the geodesic distance between points. Similar to the geodesic distances the absolute cyclic second order differences have only values in [0,{\pi}]. We update the cyclic variational TV approach by our new cyclic second order differences. To minimize the corresponding functional we apply a cyclic proximal point method which was recently successfully proposed for Hadamard manifolds. Choosing appropriate cycles this algorithm can be implemented in an efficient way. The main steps require the evaluation of proximal mappings of our cyclic differences for which we provide analytical expressions. Under certain conditions we prove the convergence of our algorithm. Various numerical examples with artificial as well as real-world data demonstrate the advantageous performance of our algorithm.
- Conference Article
7
- 10.1145/3338852.3339853
- Aug 26, 2019
In this paper, a hardware design based on the field programmable gate array (FPGA) to implement a linear regression algorithm is presented. The arithmetic operations were optimized by applying a fixed-point number representation for all hardware based computations. A floating-point number training data point was initially created and stored in a personal computer (PC) which was then converted to fixed-point representation and transmitted to the FPGA via a serial communication link. With the proposed VHDL design description synthesized and implemented within the FPGA, the custom hardware architecture performs the linear regression algorithm based on matrix algebra considering a fixed size training data point set. To validate the hardware fixed-point arithmetic operations, the same algorithm was implemented in the Python language and the results of the two computation approaches were compared. The power consumption of the proposed embedded FPGA system was estimated to be 136.82 mW.
- Research Article
4
- 10.1117/1.3205081
- Aug 1, 2009
- Optical Engineering
Numerical models of holograms are available but their evaluation is a very computationally intensive task. We present an acceleration algorithm for optical field synthesis suitable for the reduced occlusion method of hologram synthesis. The acceleration uses an approximation that is designed for a field programmable gate array (FPGA) and therefore it mostly uses fixed point numbers. The work describes the approximation, its fixed-point modification, and the resulting FPGA structure. The results presented show that the solution produces high-quality holograms in a significantly reduced time due to efficient FPGA implementation.
- Conference Article
- 10.1109/aeeca52519.2021.9574428
- Aug 27, 2021
To accelerate the hardware design and reduce the resource requirements, this paper proposes a realization of a fully connected stacked auto-encoder (SAE) with fixed-point number representations in a field programmable gate array (FPGA). This SAE neural network structure can process the input feature space, including spectral-based features and high-order cumulants, to classify the modulation types intelligently. A series of synthesizable Verilog codes were created and simulated with Xilinx Vivado software. Matrix multiplication implemented by cyclic multiplication operation and activation function is realized by the piece-wise function method with Verilog codes. There is also an SAE model operating on GPU platform, and this paper compares the performance of SAE with fixed-point numbers on FPGA platform to that with floating-point numbers on GPU platform. From experiment results, the running speed of the SAE on FPGA is faster than that on GPU, but the precision of FPGA is lower than that on GPU within an acceptable range.
- Book Chapter
1
- 10.1007/3-540-36605-9_35
- Jan 1, 2003
Evolutionary Algorithms (EAs) have been proposed as a very powerful heuristic optimization technique to solve complex problems. Many case studies have shown that they work very efficient on a large set of problems, but in general the high qualities can only be obtained by high run time costs. In the past several approaches based on parallel implementations have been studied to speed up EAs. In this paper we present a technique for the implementation of EAs in hardware based on a the concept of reusable modules. These modules are described in a Hardware Description Language (HDL). The resulting “hardware EA” can be directly synthesized and mapped to Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs). This approach finds direct application in signal processing, where hardware implementations are often needed to meet the run time requirements of a real-time system. In our prototype implementation we used VHDL and synthesized an EA for solving the OneMax problem. Simulation results show the feasibility of the approach. Due to the use of a standard HDL, the components can be reused in the form of a library.
- Conference Article
3
- 10.1117/12.596633
- Feb 25, 2005
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
The imaging radar uses the high frequency electromagnetic waves reflected from different objects for estimating of its parameters. Pulse compression is a standard signal processing technique used to minimize the peak transmission power and to maximize SNR, and to get a better resolution. Usually the pulse compression can be achieved using a matched filter. The level of the side-lobes in the imaging radar can be reduced using the special weighting function processing. There are very known different weighting functions: Hamming, Hanning, Blackman, Chebyshev, Blackman-Harris, Kaiser-Bessel, etc., widely used in the signal processing applications. Field Programmable Gate Arrays (FPGAs) offers great benefits like instantaneous implementation, dynamic reconfiguration, design, and field programmability. This reconfiguration makes FPGAs a better solution over custommade integrated circuits. This work aims at demonstrating a reasonably flexible implementation of FM-linear signal and pulse compression using Matlab, Simulink, and System Generator. Employing FPGA and mentioned software we have proposed the pulse compression design on FPGA using classical and novel windows technique to reduce the side-lobes level. This permits increasing the detection ability of the small or nearly placed targets in imaging radar. The advantage of FPGA that can do parallelism in real time processing permits to realize the proposed algorithms. The paper also presents the experimental results of proposed windowing procedure in the marine radar with such the parameters: signal is linear FM (Chirp); frequency deviation DF is 9.375MHz; the pulse width T is 3.2μs; taps number in the matched filter is 800 taps; sampling frequency 253.125*10 6 MHz. It has been realized the reducing of side-lobes levels in real time permitting better resolution of the small targets.