Resource-Constrained Software Pipelining for High-Level Synthesis of DSP Systems
Resource-Constrained Software Pipelining for High-Level Synthesis of DSP Systems
- Book Chapter
- 10.1007/0-387-22928-0_18
- Jan 1, 2005
The intense requirements of high-speed implementations of MultiDimensional (MD) Digital Signal Processing (DSP) systems justify the Application Specific Integrated Circuits (ASIC) designs and/or multiprocessor implementations. MD retiming has been recently proposed to improve the circuitry performance in high-level synthesis of single-rate MD DSP systems. This paper has conducted new theoretical analysis of MD multirate DSP systems modeled in data-flow graphs, and proposes intercalation of MD multirate systems so that unified MD retiming operations can be applied on multidimensional multirate DSP systems. By retiming and intercalation, full intra-iteration parallelism is achieved and functional elements can be executed simultaneously on circuits for the generic class of MD multirate DSP systems.KeywordsMultirate Signal ProcessingMD Data Flow GraphMultidimenional Retiming
- Research Article
3
- 10.1063/1.1147143
- Oct 1, 1996
- Review of Scientific Instruments
We have developed a digital signal processor (DSP) system to extend the capabilities of a variety of scientific instruments used to make in situ ionospheric plasma measurements from sounding rockets and spacecraft. The DSP system is extremely flexible due to the use of a field programmable gate array, sigma-delta analog-to-digital converters, and a highly integrated single-chip DSP. Using virtually identical circuitry, we have operated DSP systems processing data from low-frequency electric field detectors, Langmuir probes, and several radio-frequency receivers on several sounding rockets and spacecraft with different digital telemetry interfaces. The DSP system provides three major improvements to our instruments: (1) reduced telemetry bandwidth (data) requirements, (2) improved signal-to-noise ratio, and (3) digital antialiasing filtering. We describe our DSP system and show three examples of how we have implemented it with our instruments. Similar systems should be of interest to researchers in many fields of research.
- Conference Article
- 10.1109/arrays.1988.18097
- May 25, 1988
The architecture and performance of a multicomputer-type digital signal processing (DSP) system are discussed. The DSP system, called NOVI, has been created to examine methods for organizing parallel DSP systems and developing parallel programs for a wide range of digital signal processing applications. NOVI presently consists of 36 processing elements (PEs), each using an Inmos Transputer as a CPU. Its parallel-program-development assistant (PDA) system facilitates powerful debugging functions to observe all PE state without any interference to parallel program execution. A parallel-program-development technique using the PDA is discussed. A load-balancing technique on a multicomputer-type DSP is also discussed, focusing on low-bit-rate motion-picture coding. The balancing technique is based on interframe prediction and layered large-grain data flow. The expected performance of the NOVI system is discussed. >
- Conference Article
12
- 10.1109/icassp.1999.758289
- Jan 1, 1999
We address the issue of high-level synthesis of low-power digital signal processing (DSP) systems by proposing switching activity models. In particular, we present a technology independent hierarchical scheme to compare relative power performance of two competing DSP systems. The basic building blocks considered for such system are a full-adder and a one-bit delay. Estimates of switching activity at the output of these building blocks is used to model the activity in different architectural primitives used for building DSP systems. This method is very fast and simple and simulations show accuracy within 4% of extensive bit-level simulations. Therefore, it can easily be integrated into current communications/DSP CAD tools for low-power applications. The models show that the choice of multiplier/multiplicand is important when using array multipliers in a data-path. If the input signal with smaller variance is chosen as the as the multiplicand, up to 20% savings in switching activity can be achieved. This observation is verified by analog simulation.
- Conference Article
4
- 10.1109/fie.2013.6684995
- Oct 1, 2013
In digital signal processing (DSP) and communication systems courses much of the material is theoretical. There are some students who are more motivated to learn if they can see a connection to the real world, but unfortunately many real-world communication and DSP systems are very complex, and including them as part of a course is difficult or impossible. The FM radio, however, is a relatively simple system that is in some ways ideal as a real-world example because it includes both analog and digital signals. The analog signals transmit the audio and the digital Radio Data System (RDS) signal transmits auxiliary information such as the name of the artist, song, current time, etc. This paper describes an FM radio with RDS decoder based on an inexpensive FM module and an affordable DSP board. The system runs in real-time, demodulates FM radio, plays the music through speakers, displays the name of the song and artist, and allows access to the internal signals. This real-time receiver can be used in demonstrations in a lecture course or as the basis for a series of laboratory experiments.
- Research Article
3
- 10.1016/0141-9331(95)96909-o
- Jan 1, 1995
- Microprocessors and Microsystems
Advanced educational parallel DSP system based on TMS320C25 processors
- Research Article
8
- 10.1145/384196.384206
- Aug 1, 2001
- ACM SIGPLAN Notices
Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, processors can often exploit considerable instruction-level parallelism (ILP), and thus significantly improve performance, at the cost of substantially increasing register requirements. These increasing register requirements, however, make it difficult to build a high-performance embedded processor with a single, multi-ported register file while maintaining clock speed and limiting power consumption. Some digital signal processors, such as the TI C6x, reduce the number of ports required for a register bank by partitioning the register bank into multiple banks. Disjoint subsets of functional units are directly connected to one of the partitioned register banks. Each register bank and its associate functional units is called a cluster . Clustering reduces the number of ports needed on a per-bank basis, allowing an increased clock rate. However, execution speed can be hampered because of the potential need to copy “non-local” operands among register banks in order to make them available to the functional unit performing an operation. The task of the compiler is to both maximize parallelism and minimize the number of remote register accesses needed. Previous work has concentrated on methods to partition virtual registers amongst the target architecture's clusters. In this paper, we show how high-level loop transformations can enhance the partitioning obtained by low-level schemes. In our experiments, loop transformations improved software pipelining by 27% on a machine with 2 clusters, each having 1 floating-point and 1 integer register bank and 4 functional units. We also observed a 20% improvement on a similar machine with 4 clusters of 2 functional units. In fact, by performing the described loop transformations we were able to show improvements of greater than 10% over schedules (for un-transformed loops) generated with the unrealistic assumption of a single multi-ported register bank.
- Book Chapter
3
- 10.1016/b978-0-12-386535-9.00008-1
- Jan 1, 2012
- DSP for Embedded and Real-Time Systems
Chapter 8 - High-level Design Tools for Complex DSP Applications
- Conference Article
- 10.1109/iscas.1992.230173
- May 10, 1992
Presents a novel approach to the design and realization of digital signal processing (DSP) systems by utilizing finite state machines (FSM). The DSP system is modelled by mapping all its potential states into an FSM, whose complexity is usually very high. The FSM mirrors the complete functionality of the system and thus describes its behavior in full detail. Examples for FSMs of first and second order digital recursive filters are provided and the current version of the software simulating the FSM corresponding to any linear time-invariant DSP system is described. The potential of this approach including state reduction techniques as well as the inclusion of nonlinear DSP systems is also outlined, and future research intentions are briefly explored. >
- Conference Article
- 10.1109/comsig.1997.629997
- Sep 9, 1997
This paper describes the methodology that utilises some of the ATM traffic analysis techniques in the performance analysis of multichannel digital signal processing (DSP) systems. The analysed multichannel DSP systems employ statistical multiplexing in order to improve the DSP resource utilisation. A case study is used to illustrate the application of the proposed methodology. Performance of the studied system was estimated and it was shown that a significant efficiency gain may be achieved over the DSP systems that do not utilise statistical multiplexing of their resources.
- Book Chapter
3
- 10.1016/b978-075065798-3/50010-2
- Jan 1, 2003
- Practical Digital Signal Processing
10 - Hardware and software development tools
- Research Article
- 10.1504/ijcat.2020.10034164
- Jan 1, 2020
- International Journal of Computer Applications in Technology
In this paper, a real-time Digital Signal Processing (DSP) system is designed and implemented by using a PIC24 microcontroller circuit and a C# GUI application running on PC. The wireless communication between the PIC24 subsystem and the GUI subsystem is implemented via Bluetooth modules on the subsystems. The DSP system first digitises an input square signal of a certain frequency through an on-chip ADC of PIC24 microcontroller, then uses different FIR digital filters to extract certain harmonics of the input signal, and outputs it as a sinusoidal signal to an on-chip DAC as well as sends the sampled data and filtered data over Bluetooth to the GUI. The GUI, besides plotting the input and output wave forms, can provide a means of controlling all functionalities of the system through a developed communication protocol. The design and implementation for the proposed DSP system are successfully demonstrated by experimental results. The hardware and software co-design method can be extended to other industrial applications and used as a good paradigm of engineering education for college students.
- Conference Article
9
- 10.23919/date.2017.7927138
- Mar 1, 2017
This paper presents microwatt end-to-end digital signal processing (DSP) systems for deployment-stage real-time upper-limb movement intent decoding. This brain computer interface (BCI) DSP systems feature intercellular spike detection, sorting, and decoding operations for a 96-channel prosthetic implant. We design the algorithms for those operations to achieve minimal computation complexity while matching or advancing the accuracy of state-of-art BCI sorting and movement decoding. Based on those algorithms, we architect the DSP hardware with the focus on hardware reuse and event-driven operation. The VLSI implementation of the proposed systems in a 65-nm high-V TH shows that it can achieve 4.82 μW at the supply voltage of 300mV in the post-layout simulation. The area is 0.16 mm2.
- Book Chapter
5
- 10.1016/b978-012734530-7/50003-9
- Jan 1, 1999
- DSP Integrated Circuits
3 - Digital signal processing
- Research Article
17
- 10.1109/81.895327
- Jan 1, 2000
- IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications
This paper introduces a technique, called resynchronization, for reducing synchronization overhead in multiprocessor implementations of digital signal processing (DSP) systems. The technique applies to arbitrary collections of dedicated, programmable or configurable processors, such as combinations of programmable DSP's, ASICs, and FPGA subsystems. Thus, it is particularly well-suited to the evolving trend toward heterogeneous single-chip multiprocessors in DSP systems. Resynchronization exploits the well-known observation that in a given multiprocessor implementation, certain synchronization operations may be redundant in the sense that their associated sequencing requirements are ensured by other synchronizations in the system. The goal of resynchronization is to introduce new synchronizations in such a way that the number of original synchronizations that become redundant exceeds the number of new synchronizations that are added, and thus, the net synchronization cost is reduced. Our study is based on the context of self-timed execution for iterative dataflow specifications of DSP applications. An iterative dataflow specification consists of a dataflow representation of the body of a loop that is to be iterated indefinitely; dataflow programming in this form has been employed extensively in the DSP domain.