Most of the research in parallel data mining and machine learning algorithms is focused on improving the efficiency of existing algorithms. However, our focus is the improvement of the solution quality, or model accuracy. We are looking for "smart" strategies to invest parallel compute resources in order to achieve a better exploration of the search space by exploring several solutions in parallel, referred to as Widening. In this paper, we discuss the theoretical properties of a neighborhood-based Widening using a type of neighborhoods, optimality neighborhoods and contrast this communicationless approach to the straightforward beam-like Top-k Widening approach, which requires communication. We show a bound on the number of parallel workers needed for the communicationless approach to guarantee that it has a solution of the same quality as the Top-k approach. In addition to the theoretical comparison, we experimentally compare these two approaches in terms of running time and quality of final solution, using a widened version of the greedy algorithm for set cover problem.
Read full abstract