The production and playback processes involved in rendering 3D content exhibit a notable degree of intricacy and inefficiency. This article presents a comprehensive array of remedies aimed at addressing challenges to the acquisition, processing, and display of such content. The proposed framework is strategically compartmentalized into three key modules: the depth map generation and optimization module, the multi-viewpoint generation and optimization module, and the 3D content encoding and display module. Within the domain of depth map generation and optimization, we introduce a self-supervised convolutional network that builds upon the PSMNet(Pyramid Stereo Matching Network) architecture. This network serves the purpose of extracting accurate depth data. In the realm of multi-viewpoint generation and optimization, we put forth a novel partial convolutional neural network founded upon the Edge-PUnet framework. This network is specifically designed to enhance virtual viewpoint images. The efficacy of the depth data generation algorithm is substantiated by achieving an Out-Noc value of 2.27% on the KITTI-2012 dataset. Furthermore, the virtual viewpoint optimization algorithm yields notable outcomes, as evidenced by attained PSNR and SSIM values of 32.688 and 0.932, respectively, in the context of a street dance sequence. In summation, the outcomes of our study suggest that the implemented system holds promise for elevating the efficiency of 3D content production while concurrently curbing associated costs.