Abstract A forward model characterizes the relationship between source and a measurement in computational form. In TMS, such models are used for planning and targeting the stimulation. The contribution of the head volume conductor to the forward model is typically solved using BEM or FEM. Here we consider BEM models in the context of real-time TMS navigation and the performance benefits of C++ and GPU (Cuda) code over MATLAB. MATLAB is highly efficient in matrix computations but rather slow in other computations, while the GPUs excel in massively parallel problems. We implemented linear Galerkin (LG) BEM solvers in C++ and Cuda languages and benchmarked them against our optimized MATLAB solver [1] in constructing and using a TMS forward model. The test model had realistic gyral structure and contained 21000 surface nodes and 10200 cortical dipole triplets. On a standard 2019 PC and entry-level GPU (Nvidia GTX 1060), we built the BEM model in 75 seconds and solved full surface potentials of all cortical dipole triplets in 15 seconds using C++/Cuda, compared to over 20 minutes and 104 seconds in MATLAB (not using GPU). When this model was used in TMS simulation with a 42-dipole coil, the electric field was solved for 49 coil positions per second (cps); GPU-accelerated MATLAB and C++ were as fast. With a 15000-dipole coil model, the C++/Cuda operated at the speed of 44 cps, while the MATLAB code was over 50 times slower (< 0.1 cps). In TMS navigation, the use of GPU and C++/Cuda allows to build the whole 4-compartment model and defining regions of interest during the stimulation session, instead of building the model offline and saving and loading 1.8-2.5 GB model files. Further, the use of C++/Cuda enables real-time performance with practically any coil model. [1] Stenroos and Koponen, NeuroImage 2019 Research Category and Technology and Methods Basic Research: 10. Transcranial Magnetic Stimulation (TMS) Keywords: TMS navigation, forward model, GPU, real-time