Hardware/software (HW/SW) partitioning is an essential step in hardware/software co-design as it determines the functions to be implemented in the hardware and software. HW/SW partitioning problems are NP hard. The increasing complexities of modern embedded systems worsen the problem, thus motivating the search for solutions by heuristics. The tabu search algorithm is an effective method for HW/SW partitioning. However, the process is time consuming. The existing tabu search methods for HW/SW partitioning focus on sequential implementations that involve a trade-off between solution time and solution quality. This trade-off sacrifices the solution quality. This paper presents a GPU-based adaptive compacting neighborhood tabu search for HW/SW partitioning. First, we present an adaptive strategy that enhances the search intensification to improve the solution quality. The massive parallelism of GPU architecture can reduce the solution time of the proposed strategy. Next, to ensure that the algorithm execute efficiently on the GPU, we further propose several implementation strategies such as the representation of task graph on the GPU, the mapping between the GPU thread and candidate, and data layout and memory access optimization. Finally, we realize the proposed method in a computing unified device architecture, namely CUDA, and verify the effectiveness according to the related benchmark with different communication-to-computation ratios and real-time constraint requirements. The results show that the proposed method can yield a better solution quality compared to the existing methods. By comparing with the naive GPU implementation of adaptive compacting neighborhood tabu search, the proposed implementation strategies on the GPU significantly reduced the solution time. In addition, we verified the advantage of the GPU-based method for very large HW/SW partitioning.