This work provides an approach that automatically optimizes the component computations on graphics processing unit (GPU) devices from different vendors. The approach consists of a two-level optimization, where the first level considers the linear part of the computation for vectorization and applies mixed matrix formats to increase computational throughput further. Then, the second optimization level treats the combination of linear and non-linear parts as a black box and searches for the optimal configuration of parameters such as the degree of vectorization, the combination of matrix formats, and the group (of threads) sizes during parallel execution on GPU. Moreover, we also introduce constraints that reduce the optimization procedure’s execution time. Finally, we select three different types of components that could be representative to computational tasks in power system and perform our optimization approach on these kernels. The computational performance is compared with unoptimized baseline and sparse linear algebra library based implementations, result shows that our optimization leads to better performance and more efficient memory utilization.