Embedded Processor Architectures

Kcs Murti

doi:10.1007/978-981-16-3293-8_12

Abstract

Improvements in semiconductor technology enabled smaller feature sizes, better clock speeds, and high performance. Improvements in computer architectures were enabled by RISC architectures and efficient high-level language compilers. Together, we have enabled customized computer architectures from system-on-hips to powerful GPUs and high-performance processors. Users need that the CPU should be able to access unlimited amounts of memory with low latency. The cost of fast memory is multi-fold compared to lower speed memory. Another characteristic of CPU memory access is principle of spatial and temporal locality. The solution is to organize the memory into hierarchy by caching data at different levels. Section 12.3 covers cache basics in detail. All the memory addressable by CPU need not be in physical memory due to space and cost. It can reside in disk. The address range is mapped by virtual memory manager. Virtual address constitutes the page number and the offset within the page. This page is placed in the physical memory in the free page slot available. This is indexed in the page table. Thus, virtual memory is mapped into physical memory. Section 12.4 details the virtual memory management in detail. RISC stands for Reduced Instruction Set Computer. The clock per instruction (CPI) is one in RISC. This architecture uses optimized set of instructions executed in one cycle. This allows pipelining by which multiple instructions can be executed simultaneously in different stages. RISC has several registers; instruction decoding is simple and simple addressing modes. Section 12.5 explains RISC architectures in detail. An efficient implementation of instruction execution is to overlap the instruction executions by which each hardware unit is busy all the time. Section 12.6 explains in detail this concept of pipelining and hazards are controlled in the architecture. Several advances in pipelining architecture have been developed. But the performance improvements get saturated with new constraints and issues in implementation. When a single instruction operates on multiple data elements in a single instruction cycle, the instructions are called Single Instruction Multiple Data (SIMD) instructions. Section 12.7 introduces data-level parallelism with vector processing. Section 12.9 introduces Single instruction Multi-threading (SIMT) in GPUs. We can exploit certain type of programs where they are inherently parallel and have very little dependence. We call them as threads of execution. Thread-Level Parallelism (TLP) is explained in detail in Sect. 12.10. FPGA-based technology has made system-on-chip designs a cake’s walk. Systems with high-performance requirements are possible with hardware configured to such requirements. Temporal re-configuration in FPGAs mimicking DLLs in software has made re-use of same FPGA fabric for just-in-time “use and throw” hardware blocks. Section 12.11 covers reconfigurable computing in detail. After reading this chapter, readers will be able to understand internal architecture of any processor which helps in selecting a processor for individual requirement.

Full Text