Abstract

Due to the demand in the field of industrial control, we developed a CPU core based on ARMv8m architecture [1], named TS400, and its performance is comparable to Arm Cortex-M33 [2]. The TS400 has four stages of pipelining, thus having a higher clock frequency than Cortex-M33 which has three stages. With the increase of pipeline stages, extra idle clock beats will be introduced when the pipeline is flushed, which will increase the clock cycles per instruction (CPI) value and decrease the score of CoreMark. Accurate instruction fetch and branch prediction can effectively reduce the impact of refreshing pipeline, at the cost of extra logic resources. In TS400, an aggressive branch instruction prefetch method is designed. Compared with branch prediction technology, this method does not need complex branch prediction logic and is suitable for the design of embedded CPU. The aggressive branch instruction prefetch method includes: 1) reducing the time cost of conditional branch target fetching to the minimum by taking the branch first and then confirming the branch taking result; 2) optimizing the bus control signal timing of the prefetch instruction to make the target address prefetch respond in time. The aggressive branch prefetch method reduces the impact of pipeline stall caused by the execution of conditional branch instructions as much as possible, thus achieving better running performance than Cortex-M33 at the same clock frequency, while the clock frequency performance is superior to Cortex-M33.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call