The solution of linear systems is a crucial and indispensable technique in the field of numerical analysis. Among linear system solvers, the cyclic reduction algorithm stands out for its natural inclination to parallelization. So far, the cyclic reduction has been applied primarily to linear systems with almost block diagonal matrices. Some of its variants widen the usage to almost block diagonal matrices with a last block of rows introducing a set of non-zero elements in the first group of columns. In this work, we extend cyclic reduction to matrices with additional non-zero elements below and to the right without any limitations. These matrices, called padded bordered almost block diagonal, arise from the discretization of optimal control problems featuring arbitrary boundary conditions and free parameters. Nonetheless, they also appear in two-point boundary value problems with free parameters. The proposed algorithm is based on the LU factorizations, and it is designed to be executed in parallel on multi-thread architectures. The algorithm performance is assessed through numerical experiments with different matrix sizes and threads. The computation times and speedups obtained with the parallel implementation indicate that the suggested algorithm is a robust solution for solving padded bordered almost block diagonal linear systems. Furthermore, its structure makes it suitable for the use of different matrix factorization techniques, such as QR or SVD. This flexibility enables tailored customization of the algorithm on the basis of the specific application requirements.