In recent years, there has been an increasing need to develop new implementation techniques and design methodologies for DSP systems. Algorithmic and architectural optimizations are key to developing high-performance signal and information processing systems under strict constraints on implementation complexity and power consumption. This special issue is composed of a selection of papers reporting on advances in the design and implementation of signal processing systems. The topics range from domainspecific hardware implementation to design methodologies for signal processing algorithm implementations. In BAn energy-efficient Reconfigurable ASIP supporting multi-mode MIMO detection,^ (10.1007/s11265-015-0972-x) Ahmad, Li, Amin, Li, Van der Perre, Lauwereins, and Pollin present a programmable ASIP MIMO baseband processor. They first present an efficient modification of the Multi-Tree Selective Spanning Detector algorithm. Then they introduce a soft-output algorithm for generating log-likelihood ratios, called counter-ML bit-flipping. A C-programmable ASIP is designed for 40 nm CMOS, operating at 3.6 Gbps for hard MIMO detection, and 2.05 Gbps for soft detection. Tripakis, Limaye, Ravindran, Wang, Andrade, and Ghosal consider models of dataflow computation in their paper BTokens vs. Signals: On Conformance between Formal Models of Dataflow and Hardware^ (10.1007/s11265-0150971-y). They define a formal conformance relation between finite state machines with synchronous semantics and a formal model for dataflow: asynchronous processes communicating via queues. The conformance can provide information in determining the accuracy of hardware models, can be used to highlight timing and synchronization errors, and derive performance properties. In BA dynamic modulo scheduling with binary translation: Loop optimization with software compatibility,^ (10.1007 /s11265-015-0974-8) Ferreira, Denver, Pereira, Wong, Lisboa, and Carro propose a binary translation technique for run-time modulo scheduling of loops onto course-grained reconfigurable arrays. The technique eliminates the need to generate an intermediate dataflow graph (DFG) and uses a greedy placement step. The experimental results show that the run-time technique can achieve higher instruction-level parallelism compared to a 16-issue VLIW processor. Akin, Franchetti, and Hoe present restructured Fast Fourier Transform (FFT) algorithms with efficient memory access patterns in their paper BFFTs with Near-Optimal Memory Access Through Block Data Layouts: Algorithm, Architecture and Design Automation^ (10.1007/s11265-015-1018-0). They use a formal representation of the FFT using the Kronecker product to automatically generate hardware implementations of DRAM-optimized FFT algorithms. Results for 1D, 2D, and 3D FFTs show that their designs can achieve close to the theoretical peak performance on several different platforms. In BAnalyzing the Performance-Hardware Trade-off of an ASIP-based SIFT Feature Extraction,^ (10.1007/s11265-0150986-4) Mentzer, Paya-Vaya, and Blume consider the implementation of the Scale-Invariant Feature Transform (SIFT) used in computer vision. This complexity of the SIFT algorithm is too high for real-time implementation on * Warren J. Gross warren.gross@mcgill.ca