In the real world, there are many complex problems in engineering. Every problem has a level of computational complexity, starting from simple problems and reaching NP-hard problems. Np-hard problems do not have a definite answer. Therefore, hyper-heuristic algorithms try to optimize NP-hard problems. Hyper-heuristics optimize complex search problems using a combination plate of exploration and exploitation strategies. The current algorithms need more generalizability, handling specific data types, limitations to a particular search problem, and weak performance. We propose a World hyper-heuristic (World) to address the issues using a novel reinforcement learning method. World, in two steps of rewarding and selection, dynamically switches between exploration and exploitation strategies provided in an infinite pool of meta-heuristics. We evaluated the performance of our proposed method in three phases. First, we optimized the standard functions as artificial Np-hard problems. Then, we compared real engineering examples related to discrete Np-hard problems. Finally, we implemented and analyzed problems with the continuous Np-hard problems. Our extensive comparisons with the state-of-the-art algorithms demonstrate the World’s outperformance in handling varied search problems with any data type. Among all the findings, World finds the shortest path of length 4.33e + 05, far shorter than the results of the state-of-the-art work, in benchmarked data of 10,000 real cities.