The research reported in the paper deals with difficult black-box problems solved by means of popular metaheuristic algorithms implemented on up-to-date parallel, multi-core, and many-core platforms. In consecutive publications we are trying to show how particular population-based techniques may further benefit from employing dedicated hardware like GPGPU or FPGA for delegating different parts of the computing in order to speed it up. The main contribution of this paper is an experimental study focused on profiling of different possibilities of implementation of Scatter Search algorithm, especially delegating some of its selected components to GPGPU. As a result, a concise know-how related to the implementation of a population-based metaheuristic similar to Scatter Search is presented using a difficult discrete optimization problem; namely, Golomb Ruler, as a benchmark.