Aiming at the problems of existing robot grasping systems that have high hardware requirements, are difficult to adapt to different objects, and produce large harmful torques during the grasping process, a visual detection and grasping method based on deep learning is proposed. The channel attention mechanism is used to improve YOLO-V3, enhance the network's ability to extract image features, improve the effect of target detection in complex environments, and increase the average recognition rate compared with the original . Aiming at the problem of discreteness of the current pose estimation angle, a minimum area bounding rectangle (MABR) algorithm based on the main network embedded in the Visual Geometry Group 16 (VGG-16) is proposed to perform grasping pose estimation and angle optimization. The average error between the improved grasping angle and the actual angle of the target is less than , which greatly reduces the harmful torque applied by the two-finger manipulator to the object during the grasping process. A visual grasping system was built using UR5 robotic arm, pneumatic two-finger manipulator, Realsense D435 camera and ATI-Mini45 six-dimensional force sensor. Experiments show that the proposed method can effectively grasp and classify different objects, has low hardware requirements, and reduces harmful torque by about , thereby reducing damage to objects. It has good application prospects.