Articles published on Fpga interconnect
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
24 Search results
Sort by Recency
- Research Article
6
- 10.1145/3501802
- Dec 9, 2022
- ACM Transactions on Reconfigurable Technology and Systems
- Stefan Nikolić + 2 more
In this work, we develop timing-driven CAD support for FPGA architectures with direct connections between LUTs. We do so by proposing an efficient ILP-based detailed placer, which moves a carefully selected subset of LUTs from their original positions, so that connections of the user circuit can be appropriately aligned with the direct connections of the FPGA, reducing the circuit’s critical path delay. We discuss various aspects of making such an approach practicable, from efficient formulation of the integer programs themselves, to appropriate selection of the movable nodes. These careful considerations enable simultaneous movement of tens of LUTs with tens of candidate positions each, in a matter of minutes. In this manner, the impact of additional connections on the critical path delay more than doubles, compared to the previously reported results that relied solely on architecture-oblivious placement.
- Research Article
25
- 10.1145/3472769
- Jan 28, 2022
- ACM Transactions on Reconfigurable Technology and Systems
- Sahand Salamat + 3 more
As the size of data generated every day grows dramatically, the computational bottleneck of computer systems has shifted toward storage devices. The interface between the storage and the computational platforms has become the main limitation due to its limited bandwidth, which does not scale when the number of storage devices increases. Interconnect networks do not provide simultaneous access to all storage devices and thus limit the performance of the system when executing independent operations on different storage devices. Offloading the computations to the storage devices eliminates the burden of data transfer from the interconnects. Near-storage computing offloads a portion of computations to the storage devices to accelerate big data applications. In this article, we propose a generic near-storage sort accelerator for data analytics, NASCENT2, which utilizes Samsung SmartSSD, an NVMe flash drive with an on-board FPGA chip that processes data in situ. NASCENT2 consists of dictionary decoder, sort, and shuffle FPGA-based accelerators to support sorting database tables based on a key column with any arbitrary data type. It exploits data partitioning applied by data processing management systems, such as SparkSQL, to breakdown the sort operations on colossal tables to multiple sort operations on smaller tables. NASCENT2 generic sort provides 2 × speedup and 15.2 × energy efficiency improvement as compared to the CPU baseline. It moreover considers the specifications of the SmartSSD (e.g., the FPGA resources, interconnect network, and solid-state drive bandwidth) to increase the scalability of computer systems as the number of storage devices increases. With 12 SmartSSDs, NASCENT2 is 9.9× (137.2 ×) faster and 7.3 × (119.2 ×) more energy efficient in sorting the largest tables of TPCC and TPCH benchmarks than the FPGA (CPU) baseline.
- Research Article
2
- 10.1088/1674-4926/41/2/022405
- Feb 1, 2020
- Journal of Semiconductors
- Ruiqi Luo + 2 more
Previous studies show that interconnects occupy a large portion of the timing budget and area in FPGAs. In this work, we propose a time-multiplexing technique on FPGA interconnects. In order to fully exploit this interconnect architecture, we propose a time-multiplexed routing algorithm that can actively identify qualified nets and schedule them to multiplexable wires. We validate the algorithm by using the router to implement 20 benchmark circuits to time-multiplexed FPGAs. We achieve a 38% smaller minimum channel width and 3.8% smaller circuit critical path delay compared with the state-of-the-art architecture router when a wire can be time-multiplexed six times in a cycle.
- Research Article
6
- 10.1145/3375459
- Jan 30, 2020
- ACM Transactions on Reconfigurable Technology and Systems
- Alexandra Kourfali + 1 more
In this work, a novel method for in-circuit debugging on FPGAs is introduced that allows the insertion of low-overhead debugging infrastructure by exploiting the technique of parameterized configurations. This allows the parameterization of the LUTs and the routing infrastructure to create a virtual network of debugging multiplexers. It aims to facilitate debugging, to increase the internal signal observability, and to reduce the debugging (area and reconfiguration) overhead. Signal ranking techniques are also introduced that classify signals that can be traced during debug. Finally, the results of the method are presented and compared with a commercial tool. The area and time results and the tradeoffs between internal signal observability and area and reconfiguration overhead are also explored.
- Research Article
5
- 10.1007/s10836-019-05827-7
- Oct 1, 2019
- Journal of Electronic Testing
- Shukla Banik + 2 more
This paper presents a FPGA interconnect test configuration generation strategy for application-independent testing using Satisfiability (SAT). The technique generates all possible path configurations for the interconnect to obtain full coverage of all interconnect resources. The integrated testing approach is proposed which generates test vectors and path configurations in a single phase, thus obtaining a significant reduction in the number of test configurations needed to test the circuit. To generate test configurations, constraints have been designed using SAT. The proposed technique targets open and short faults in the interconnect resources. Test configurations have been generated for different FPGA architectures. The objective of the proposed approach is to minimize the number of configurations without reducing the fault coverage.
- Research Article
6
- 10.1109/tvlsi.2017.2691409
- Aug 1, 2017
- IEEE Transactions on Very Large Scale Integration (VLSI) Systems
- Safeen Huda + 1 more
Conventional field-programmable gate arrays are typically overprovisioned with routing resources to ensure that they meet routeability targets, which results in increased routing static and dynamic power. In this paper, we leverage the excess routing conductors to reduce dynamic and static power. To reduce dynamic power, we propose to ensure that used routing conductors are adjacent to unused routing conductors, which are left floating to reduce the effective capacitance seen by active nets. To reduce static power, we observe that leakage in routing multiplexers is dominated by specific paths; if the routing conductors, which connect to the input pins on these paths, are unused and left floating, the leakage of the multiplexer may be significantly reduced. To ensure that unused conductors are allowed to float requires the use of tristate routing buffers, and thus we propose two low-cost tristate buffer topologies with different power and area-overhead tradeoffs. We also introduce CAD techniques to optimize the overall energy dissipation in the routing network using the proposed techniques. Results show that interconnect dynamic power reductions of up to 25%, interconnect static power reductions of up to 81%, and overall interconnect energy reductions ranging between 14.9%-42.7% are expected, with a critical path degradation of <;1.8% and area-overhead of 2.6%-4.8%.
- Research Article
- 10.1016/j.microrel.2015.01.011
- Feb 1, 2015
- Microelectronics Reliability
- A Ben Dhia + 3 more
A dual-rail compact defect-tolerant multiplexer
- Research Article
26
- 10.1145/2629442
- Aug 1, 2014
- ACM Transactions on Reconfigurable Technology and Systems
- Mohamed S Abdelfattah + 1 more
As FPGA capacity increases, a growing challenge is connecting ever-more components with the current low-level FPGA interconnect while keeping designers productive and on-chip communication efficient. We propose augmenting FPGAs with networks-on-chip (NoCs) to simplify design, and we show that this can be done while maintaining or even improving silicon efficiency. We compare the area and speed efficiency of each NoC component when implemented hard versus soft to explore the space and inform our design choices. We then build on this component-level analysis to architect hard NoCs and integrate them into the FPGA fabric; these NoCs are on average 20--23× smaller and 5--6× faster than soft NoCs. A 64-node hard NoC uses only ∼2% of an FPGA's silicon area and metallization. We introduce a new communication efficiency metric: silicon area required per realized communication bandwidth. Soft NoCs consume 4960 mm 2 /TBps, but hard NoCs are 84× more efficient at 59 mm 2 /TBps. Informed design can further reduce the area overhead of NoCs to 23 mm 2 /TBps, which is only 2.6× less efficient than the simplest point-to-point soft links (9 mm 2 /TBps). Despite this almost comparable efficiency, NoCs can switch data across the entire FPGA while point-to-point links are very limited in capability; therefore, hard NoCs are expected to improve FPGA efficiency for more complex styles of communication.
- Research Article
28
- 10.1109/tcad.2013.2291659
- Mar 1, 2014
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- Elias Vansteenkiste + 3 more
Dynamic partial reconfiguration of FPGAs enables the dynamic specialization of the circuit for the runtime needs of the application. Previously a tool flow, called the TLUT tool flow, was developed to aid the designer in applying dynamic circuit specialization (DCS) for their designs. The TLUT tool flow generates an implementation in which the lookup tables (LUTs) can be specialized during runtime. In this paper, place and route algorithms are described for the TCON tool flow. The TCON tool flow generates implementations in which not only the logic infrastructure (LUTs) is dynamically specialized, but also the routing infrastructure of the FPGA. Exploiting the reconfigurability of the FPGA interconnection network further improves area (50% to 92% less LUTs and 36% to 81% less wiring), logic depth (a 63% to 80% reduction) and power consumption. To achieve this, major changes were needed, not only in the mapping, but also in the place and route steps. This work describes the altered place and route algorithms, called TPlace and Troute.
- Research Article
1
- 10.1155/2014/279673
- Jan 1, 2014
- International Journal of Reconfigurable Computing
- Naveed Imran + 1 more
Distance-Ranked Fault Identification (DRFI)is a dynamic reconfiguration technique which employs runtime inputs to conduct online functional testing of fielded FPGA logic and interconnect resources without test vectors. At design time, a diverse set of functionally identical bitstream configurations are created which utilize alternate hardware resources in the FPGA fabric. An ordering is imposed on the configuration pool as updated by the PageRank indexing precedence. The configurations which utilize permanently damaged resources and hence manifest discrepant outputs, receive lower rank are thus less preferred for instantiation on the FPGA. Results indicate accurate identification of fault-free configurations in a pool of pregenerated bitstreams with a low number of reconfigurations and input evaluations. For MCNC benchmark circuits, the observed reduction in input evaluations is up to 75% when comparing the DRFI technique to unguided evaluation. The DRFI diagnosis method is seen to isolate all 14 healthy configurations from a pool of 100 pregenerated configurations, and thereby offering a 100% isolation accuracy provided the fault-free configurations exist in the design pool. When a complete recovery is not feasible, graceful degradation may be realized which is demonstrated by the PSNR improvement of images processed in a video encoder case study.
- Research Article
25
- 10.1109/tc.2011.247
- Jan 1, 2013
- IEEE Transactions on Computers
- T Nandha Kumar + 1 more
This paper presents a new method for generating configurations for application-dependent testing of a SRAM-based FPGA interconnect. This method connects an activating input to multiple nets, thus generating activating test vectors for detecting stuck-at, open, and bridging faults. This arrangement permits a reduction in the number of redundant configurations, thus also achieving a reduction in test time for application-dependent testing at full fault coverage. As the underlying solution requires an exponential complexity, a heuristic algorithm that is polynomial and greedy in nature (based on sorting) is used for net selection in the configuration generation process. It is proved that this algorithm has an execution complexity of O(L <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ) (where L is the number of LUTs in the design). The proposed method requires at most log <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> (M + 2) configurations (where M denotes the number of activating inputs) as Walsh coding is employed. Moreover, it is scalable with respect to LUT inputs. Extensive logic-based simulation results are provided for ISCAS89 sequential benchmark designs implemented on Xilinx Virtex4 FPGAs; these results shows that the proposed method achieves a considerable reduction in the number of test configurations compared with methods found in the technical literature (on average, a reduction of 49.5 percent).
- Addendum
1
- 10.1007/s10836-011-5249-0
- Sep 6, 2011
- Journal of Electronic Testing
- Jianfeng Zhu + 3 more
Erratum to: A Cost-Efficient Self-Configurable BIST Technique for Testing Multiplexer-Based FPGA Interconnect
- Research Article
6
- 10.1007/s10836-011-5238-3
- Jul 30, 2011
- Journal of Electronic Testing
- Zhu Jianfeng + 3 more
FPGA test cost can be reduced effectively by minimizing the number of test configurations. To realize it, a self-configurable structure was proposed before to test the cross-point-based switch box in FPGA. In this paper, a technique of partially self-configurable multiplexers is presented to reduce the test cost of completely multiplexer-based FPGA interconnect cost-efficiently. The additional self-configured structure, called test point here, is only added to the most efficient configuration ports, which is selected through analyzing test configurations, so the test cost can reduce with the minimal area overhead. It is shown that for testing all interconnect stuck-at faults in FPGAs like Virtex-II and Spartan-3 the test configurations can be reduced to 8 with merely about 1.2% area penalty.
- Research Article
- 10.3724/sp.j.1146.2009.01007
- Aug 26, 2010
- Journal of Electronics & Information Technology
- Wei Li + 2 more
单驱动实现和多驱动实现是FPGA中单向互连通道的两种实现形式。该文讨论了二者在版图面积、延时等方面的差异,以及它们各自对通道结构的限制。提出在互连结构中将两种实现形式进行组合。并给出一种有效的结构设计方法,通过两级优化得到了面积延时积最优情况下对应的互连线段长度组合方式以及互连实现形式组合方式。与其他结构相比,使用该文方法得到的50%长度为6的单驱动电路,25%长度为8的多驱动电路和25%长度为8的单驱动电路的组合结构,改进了57%~86%的面积延时积。
- Research Article
1
- 10.1016/j.vlsi.2010.01.002
- Jan 29, 2010
- Integration
- Terrence Mak + 3 more
Wave-pipelined intra-chip signaling for on-FPGA communications
- Research Article
3
- 10.1080/00207210801924586
- Jul 1, 2008
- International Journal of Electronics
- Jae Young Hur + 2 more
We present a novel use of wiring flexibility in modern FPGA technology in order to implement an on-demand network topology. Conventional rigid router-based networks on chip incur certain overheads due to huge logic resources occupation and topology embedding. In this work, we implement partially reconfigurable point-to-point (ρ-P2P) interconnects to alleviate such overheads. In our implementation, arbitrary topologies can be realised by updating a partial bitstream for the ρ-P2P interconnects. We consider parallel merge sort, Cannon's matrix multiplication, and wavelet applications to generate network traffic. Furthermore, we implement a packet switched network to serve as a reference. The experiments show that the utilisation of our P2P interconnects performs 2 times better and occupies 70% less area when compared to the reference network. Furthermore, the topology reconfiguration latency is significantly reduced using the Xilinx module-based partial reconfiguration technique. Finally, our experiments suggest that higher performance gains can be achieved as the problem size increases.
- Research Article
13
- 10.1145/1344418.1344426
- Apr 2, 2008
- ACM Transactions on Design Automation of Electronic Systems
- Yu Hu + 3 more
Field programmable dual-Vdd interconnects are effective in reducing FPGA power. We formulate the dual-Vdd-aware slack budgeting problem as a linear program (LP) and a min-cost network flow problem, respectively. Both algorithms reduce interconnect power by 50% on average compared to single-Vdd interconnects, but the network-flow-based algorithm runs 11x faster on MCNC benchmarks. Furthermore, we develop simultaneous retiming and slack budgeting (SRSB) with flip-flop layout constraints in dual-Vdd FPGAs based on mixed integer linear programming, and speed-up the algorithm by LP relaxation and local legalization. Compared to retiming followed by slack budgeting, SRSB reduces interconnect power by up to 28.8%.
- Research Article
4
- 10.1080/00207210701828069
- Mar 1, 2008
- International Journal of Electronics
- Z Marrakchi + 3 more
This paper evaluates a new multilevel hierarchical FPGA (MFPGA). The specific architecture includes two unidirectional programmable networks: a downward network based on the Butterfly-Fat-Tree topology; and a special upward network. New tools are developed to place and route several benchmark circuits on this architecture. Comparison with the traditional symmetric Manhattan mesh architecture shows that MFPGA can implement circuits with a smaller area and better speed.
- Research Article
3
- 10.14288/1.0066778
- Jan 1, 2008
- Open Collections
- Paul Teehan
FPGA clock frequencies are slow enough that only a fraction of the interconnect’s bandwidth is used. By exploiting this bandwidth, the transfer of large amounts of data can be greatly accelerated. Alternatively, it may also be possible to save area on fixed-bandwidth links by using on-chip serial signaling. For datapath-intensive designs which operate on words instead of bits, this can reduce wiring congestion as well. This thesis proposes relatively simple circuit-level modifications to FPGA interconnect to enable high-bandwidth communication. High-level area estimates indicate a potential interconnect area savings of 10 to 60% when serial links are used. Two interconnect pipelining techniques, wave pipelining and surfing, are adapted to FPGAs and compared against each other and against regular FPGA interconnect in terms of throughput, reliability, area, power, and latency. Source-synchronous signaling is used to achieve high data rates with simple receiver design. Statistical models for high-frequency power supply noise are developed and used to estimate the probability of error of wave pipelined and surfing links as a function of link length and operating speed. Surfing is generally found to be more reliable and less sensitive to noise than wave pipelining. Simulation results in a 65nm process demonstrate a throughput of 3Gbps per wire across a 50-stage, 25mm link.
- Research Article
41
- 10.1007/s10836-006-9319-7
- Jun 1, 2006
- Journal of Electronic Testing
- Jack Smith + 2 more
We present an efficient built-in self-test (BIST) architecture for testing and diagnosing stuck-at faults, delay faults, and bridging faults in FPGA interconnect resources. The BIST structure contains self-enabling test pattern generators, self-configurable switch matrices, and response analyzers that all work together and reprogram themselves without any external intervention. This eliminates downloading configuration bitstreams into the FPGA after the start of testing and, hence, reduces test time. Our technique requires only six different switch matrix configurations to test the interconnect, which is fewer than prior methods, while retaining good diagnostic resolution. The area overhead to add self-configurable test structures to Xilinx FPGAs is as low as 0.5%.