With the increase in the complexity of models and lack of flexibility offered by the analog computers, coupled with the advancements in digital hardware, the simulation industry has subsequently moved to digital computers and increased usage of programming languages such as C, C++, and MATLAB. However, the reduced time-step required to simulate complex and fast systems imposes a tighter constraint on the time within which the computations have to be performed. The sequential execution of these computations fails to cope with the real-time constraints which further restrict the usefulness of Real-Time Simulation (RTS) in a Virtual Reality (VR) environment. In this paper, we present a methodology for the design and implementation of RTS algorithms, based on the use of Field-Programmable Gate Array (FPGA) technology. We apply our methodology to an 8th order steering valve subsystem of a vehicle with relatively low response time requirements and use the FPGA technology to improve the response time of this model. Our methodology utilizes traditional hardware/software co-design approaches to generate a heterogeneous architecture for an FPGA-based simulator by porting the computationally complex regions to hardware. The hardware design was optimized such that it efficiently utilizes the parallel nature of FPGAs and pipelines the independent operations. Further enhancement was made by building a hardware component library of custom accelerators for common non-linear functions. The library also stores the information about resource utilization, cycle count, and the relative error with different bit-width combinations for these components, which is further used to evaluate different partitioning approaches. In this paper, we illustrate the partitioning of a hardware-based simulator design across dual FPGAs, initiate RTS using a system input from a Hardware-in-the-Loop (HIL) framework, and use these simulation results from our FPGA-based platform to perform response analysis. The total simulation time, which includes the time required to receive the system input over a socket (without HIL), software initialization, hardware computation, and transfer of simulation results back over a socket, shows a speedup of 2 × as compared to a similar setup with no hardware acceleration. The correctness of the simulation output from the hardware has also been validated with the simulated results from the software-only design.