The three-dimensional reconstruction of forests is crucial in remote sensing technology, ecological monitoring, and forestry management, as it yields precise forest structure and tree parameters, providing essential data support for forest resource management, evaluation, and sustainable development. Nevertheless, forest 3D reconstruction now encounters obstacles including higher equipment costs, reduced data collection efficiency, and complex data processing. This work introduces a unique deep learning model, CPH-Fmnet, designed to enhance the accuracy and efficiency of 3D reconstruction in intricate forest environments. CPH-Fmnet enhances the FPN Encoder-Decoder Architecture by meticulously incorporating the Channel Attention Mechanism (CA), Path Aggregation Module (PA), and High-Level Feature Selection Module (HFS), alongside the integration of the pre-trained Vision Transformer (ViT), thereby significantly improving the model’s global feature extraction and local detail reconstruction abilities. We selected three representative sample plots in Haidian District, Beijing, China, as the study area and took forest stand sequence photos with an iPhone for the research. Comparative experiments with the conventional SfM + MVS and MVSFormer models, along with comprehensive parameter extraction and ablation studies, substantiated the enhanced efficacy of the proposed CPH-Fmnet model in addressing difficult circumstances such as intricate occlusions, poorly textured areas, and variations in lighting. The test results show that the model does better on a number of evaluation criteria. It has an RMSE of 1.353, an MAE of only 5.1%, an r value of 1.190, and a forest reconstruction rate of 100%, all of which are better than current methods. Furthermore, the model produced a more compact and precise 3D point cloud while accurately determining the properties of the forest trees. The findings indicate that CPH-Fmnet offers an innovative approach for forest resource management and ecological monitoring, characterized by cheap cost, high accuracy, and high efficiency.