In term of the industrialization era, robot gradually appears in some production stages instead of worker. There is an irreversible tendency to deploy the image processing techniques into the field of robot control. In recent years, the vision-based techniques have reached to many considerable achievements. Most of them requires the complicated setup, specialized camera and professional operator for the process of burden computation. In this paper, an efficient solution based on the vision technique to detect and grasp an object in the indoor environment is introduced. The framework of this system which contains the geometrical constraints, theories of robot control and hardware platform, is described. To manipulate the task of detecting and grasping, the proposed method from calibration to visual estimation is mentioned in detail. From the results of both theoretical simulation and experiment, it is clearly stated that our approach is efficient, feasible and applicable.