Model-Based Iterative Reconstruction (MBIR) algorithms iteratively use expensive computational operators of forward and backward projections. The irregular memory access pattern of these operators makes them a memory-bound application. Their computation time must be reduced to meet clinical routine constraints. This paper proposes a hardware accelerator architecture based on Field Programmable Gate Arrays (FPGAs) through high-level language, as an alternative to GPU architecture. This acceleration is based on an offline memory access analysis to address the main bottleneck of the algorithm and maximize the data reuse rate. The offline analysis allows for the tuning of the architecture parameters so that they converge to an optimal solution. Then, the Berkeley Roofline model guides our optimization steps by iteratively analyzing the design performance. Our design flow significantly improved the algorithm's computational intensity and overcame the memory bottleneck. Thus, our architecture takes advantage of the FPGA local memory to achieve significant memory bandwidth and efficiently harness the pipeline without stalling the computation. Furthermore, we present the scaling-up strategy from mid-range FPGA to high-end FPGA and any concerns of portability. We used two Intel FPGA devices to implement the algorithm, and then we compared the results with our GPU implementation in terms of speedup and energy efficiency. Our experimental results show that our design has achieved better computational throughput than the works on FPGA architectures reported in the literature.