When multiple objects are positioned close together or stacked, pre-grasp operations such as pushing objects can be used to create space for the grasp, thereby improving the grasping success rate. This study develops a model based on a deep Q-learning network architecture and introduces a fully convolutional network to accurately identify pixels in the workspace image that correspond to target locations for exploration. In addition, this study incorporates image masking to limit the exploration area of the robotic arm, ensuring that the agent consistently explores regions containing objects. This approach effectively addresses the sparse reward problem and improves the convergence rate of the model. Experimental results from both simulated and real-world environments show that the proposed method accelerates the learning of effective grasping strategies. When image masking is applied, the success rate in the grasping task reaches 80% after 600 iterations. The time required to reach 80% success rate is 25% shorter when image masking is used compared to when it is not used. The main finding of this study is the direct integration of image masking technique with a deep reinforcement learning (DRL) algorithm, which offers significant advancement in robotic arm control. Furthermore, this study shows that image masking technique can substantially reduce training time and improve the object grasping success rate. This innovation enables the robotic arm to better adapt to scenarios that conventional DRL methods cannot handle, thereby improving training efficiency and performance in complex and dynamic industrial applications.
Read full abstract