Objective. The reconstruction of three-dimensional optical imaging that can quantitatively acquire the target distribution from surface measurements is a serious ill-posed problem. Traditional regularization-based reconstruction can solve such ill-posed problem to a certain extent, but its accuracy is highly dependent on a prior information, resulting in a less stable and adaptable method. Data-driven deep learning-based reconstruction avoids the errors of light propagation models and the reliance on experience and a prior by learning the mapping relationship between the surface light distribution and the target directly from the dataset. However, the acquisition of the training dataset and the training of the network itself are time consuming, and the high dependence of the network performance on the training dataset results in a low generalization ability. The objective of this work is to develop a highly robust reconstruction framework to solve the existing problems. Approach. This paper proposes a physical model constrained neural networks-based reconstruction framework. In the framework, the neural networks are to generate a target distribution from surface measurements, while the physical model is used to calculate the surface light distribution based on this target distribution. The mean square error between the calculated surface light distribution and the surface measurements is then used as a loss function to optimize the neural network. To further reduce the dependence on a priori information, a movable region is randomly selected and then traverses the entire solution interval. We reconstruct the target distribution in this movable region and the results are used as the basis for its next movement. Main Results. The performance of the proposed framework is evaluated with a series of simulations and in vivo experiment, including accuracy robustness of different target distributions, noise immunity, depth robustness, and spatial resolution. The results collectively demonstrate that the framework can reconstruct targets with a high accuracy, stability and versatility. Significance. The proposed framework has high accuracy and robustness, as well as good generalizability. Compared with traditional regularization-based reconstruction methods, it eliminates the need to manually delineate feasible regions and adjust regularization parameters. Compared with emerging deep learning assisted methods, it does not require any training dataset, thus saving a lot of time and resources and solving the problem of poor generalization and robustness of deep learning methods. Thus, the framework opens up a new perspective for the reconstruction of three-dimension optical imaging.