Abstract

In the field of intelligent manufacturing, robot grasping and sorting is important content. However, there are some disadvantages in the traditional single-view-based manipulator grasping methods by using a 2D camera, where the efficiency and the accuracy of grasping are both low when facing the scene of stacking and occlusion for the reason that there is information missing by single-view 2D camera-based methods while acquiring scene information, and the methods of grasping only can’t change the difficult-to-grasp scene which is stack and occluded. Regarding the issue above, a pushing-grasping collaborative method based on the deep Q-network in dual viewpoints is proposed in this paper. This method in this paper adopts an improved deep Q-network algorithm, with an RGB-D camera to obtain the information of objects’ RGB images and point clouds from two viewpoints, which solved the problem of lack of information missing. What’s more, it combines the pushing and grasping actions with the deep Q-network, which make it have the ability of active exploration, so that the trained manipulator can make the scenes less stacking and occlusion, and with the help of that, it can perform well in more complicated grasping scenes. In addition, we improved the reward function of the deep Q-network and propose the piecewise reward function to speed up the convergence of the deep Q-network. We trained different models and tried different methods in the V-REP simulation environment, and it drew a conclusion that the method proposed in this paper converges quickly and the success rate of grasping objects in unstructured scenes raises up to 83.5%. Besides, it shows the generalization ability and well performance when novel objects appear in the scenes that the manipulator has never grasped before.

Highlights

  • In the field of intelligent manufacturing, robot grasping and sorting is important content

  • Compared with the grasping only strategy, the deep Q-network (DQN) algorithm based on dual viewpoints in this paper improves in 3 aspects: ➀ The dual viewpoints are used to obtain the objects’ information in the area to be grasped, avoiding missing information in single viewpoint; ➁ A novel pushing action is introduced, which makes up for the disadvantage of the method with grasping only which can not change the complicated environment like stack and occlusion; ➂ Adopt piecewise reward strategy, which solve the problems of slow convergence caused by single reward

  • After the model is trained, in order to verify that the pushing-grasping strategy based on DQN and dual viewpoints has a better generalization ability than the grasping strategy based on DQN and single viewpoint, three unknown objects are selected to design three groups of experiments, and three unknown objects which are triangles, semicircles, and cylinders

Read more

Summary

Related work

The deep reinforcement learning method combines deep learning and reinforcement learning, uses deep learning to automatically learn the abstract features of large-scale input data, and uses the reinforcement learning part to make decisions. Redesigned and proposed a Deep Deterministic Policy Gradient (DDPG) algorithm based on AC (Actor-Critic) framework This algorithm can be used to solve the deep reinforcement learning problem in the continuous action space, and it can solve many physical tasks, such as controlling the inverted pendulum, end-to-end operation (arrival, handling) of the robotic arm, car driving, etc. Heess et al.[21] proposed the Stochastic Value Gradient (SVG) method, which aims to be applied to the control of the continuous motion of the robotic arm This method is used to complete the task of screwing caps similar to the sorting task. Levine et al.[22] proposed a robotic arm grasping method that does not require hand-eye calibration, and trains the parameters of the neural network through a large number of operations, and determines the actions to be executed according to the establishment of the grasping prediction network to complete the grasping task. By minimizing the square error of the current value function and the target value function, the network parameters can be updated

Description of state set
Description of action space
Design of value network
Reward configuration
Results and analysis of experiment in simulation environment
Comparative experiment
Tests of model’s generalization ability
Conclusion
Findings
Competing interest
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call