A Flexible Reinforced Bin Packing Framework with Automatic Slack Selection

Ting Yang,Joel Fuentes,Chunhua Gu,Fei Luo,Weichao Ding,José García

doi:10.1155/2021/6653586

Abstract

The slack-based algorithms are popular bin-focus heuristics for the bin packing problem (BPP). The selection of slacks in existing methods only consider predetermined policies, ignoring the dynamic exploration of the global data structure, which leads to nonfully utilization of the information in the data space. In this paper, we propose a novel slack-based flexible bin packing framework called reinforced bin packing framework (RBF) for the one-dimensional BPP. RBF considers the RL-system, the instance-eigenvalue mapping process, and the reinforced-MBS strategy simultaneously. In our work, the slack is generated with a reinforcement learning strategy, in which the performance-driven rewards are used to capture the intuition of learning the current state of the container space, the action is the choice of the packing container, and the state is the remaining capacity after packing. During the construction of the slack, an instance-eigenvalue mapping process is designed and utilized to generate the representative and classified validate set. Furthermore, the provision of the slack coefficient is integrated into MBS-based packing process. Experimental results show that, in comparison with fit algorithms, MBS and MBS’, RBF achieves state-of-the-art performance on BINDATA and SCH_WAE datasets. In particular, it outperforms its baseline MBS and MBS’, averaging the number increase of optimal solutions of 189.05% and 27.41%, respectively.

Highlights

As a classical discrete combinatorial optimization problems [1, 2], the bin packing problem (BPP) [3, 4] aims to minimize the number of used bins to pack items and it is NP-hard [5, 6]
In order to solve the problems of minimum bin slack (MBS) described above, we propose a reinforced bin packing framework, dubbed RBF, to resolve the BPP, where a reinforcement learning (RL) method, i.e., the Q-learning algorithm, is exploited to select a high-quality slack for the packing process. e RBF treats Q-learning as a prior data spatial information detector
E framework of RBF is illustrated in Figure 1, which consists of a RL-system, a reinforced-MBS strategy, and an instance-eigenvalue mapping process, and defined as follows: (1) RL-system: the RL-system is used to generate a suitable slack by a reinforcement learning strategy, where the best action selection strategy is controlled by Q-agent

Summary

Introduction

As a classical discrete combinatorial optimization problems [1, 2], the bin packing problem (BPP) [3, 4] aims to minimize the number of used bins to pack items and it is NP-hard [5, 6]. As one of typical heuristic algorithms, the minimum bin slack (MBS) is particular useful to problems where an optimal solution requires most of the bins, if not all, to be exactly filled [14]. It is useful for solving the problems where the sum of requirements of items is less than or equal to twice the bin capacity. In MBS, the selection of the packing sequence of the items is based on a predetermined strategy, which ignores the sampling deviation between the data of the items to be packed and cannot explore the global data space

Methods

Results

Conclusion