The study is aimed at solving the problem of large measurement errors caused by the binocular camera in traditional 3D art design, which leads to inaccurate 3D information of the target. The contour information extraction in the process of human motion pose reconstruction is easily affected by the noise in the image. Therefore, a binocular stereo vision system is built first and it integrates image acquisition, camera calibration, and image processing. The dedistortion method is used to process the image because it can reduce errors. Second, a three‐dimensional human motion pose reconstruction model is implemented, the Gaussian template is used to remove the noise in the image frame, and the change detection template (CDM) is used to solve the problem of background “exposure” and “occlusion.” Finally, simulation experiments are designed to verify the system and model designed. Since the research on the application of pose estimation based on visual sensing technology in art design is still blank, such research has great significance and provides a reference for the research in the field. The literature analysis is used to expound and analyze the application of pose estimation based on visual sensing technology in visual communication design and environmental art design: (1) although the binocular stereo vision system causes some errors in the measurement, the overall error is controlled within 2% and the accuracy is high, which proves that it can be applied to the acquisition of three‐dimensional information of the target in art design; (2) there is a high degree of fitting between the video sequence data created by the three‐dimensional human motion pose reconstruction model designed and the real motion data, which indicates that this method has high accuracy in processing image sequences and the feasibility of applying it to human pose reconstruction in three‐dimensional art design is high; (3) through the analysis of the existing literature, it is found that most of the current visual‐based attitude assessment studies are carried out by using network cameras combined with computers, and the quality of the obtained images is low. The combination of binocular stereo sensor and attitude estimation technology can be applied to the design of advertising, animation, games, and packaging, making the behavior of virtual characters in animation and games more vivid. The combination provides convenience for the collection of environmental spatial information and object attitude information, the formulation of a design scheme, and real‐time monitoring of construction in environmental art design. The purpose of this study is to provide an important theoretical basis for the technical upgrading of art design.