Generalized stereo matching faces the radiation difference and small ground feature difference brought by different satellites and different time phases, while the texture-less and disparity discontinuity phenomenon seriously affects the correspondence between matching points. To address the above problems, a novel generalized stereo matching method based on the iterative optimization of hierarchical graph structure consistency cost is proposed for urban 3D scene reconstruction. First, the self-similarity of images is used to construct k-nearest neighbor graphs. The left-view and right-view graph structures are mapped to the same neighborhood, and the graph structure consistency (GSC) cost is proposed to evaluate the similarity of the graph structures. Then, cross-scale cost aggregation is used to adaptively weight and combine multi-scale GSC costs. Next, object-based iterative optimization is proposed to optimize outliers in pixel-wise matching and mismatches in disparity discontinuity regions. The visibility term and the disparity discontinuity term are iterated to continuously detect occlusions and optimize the boundary disparity. Finally, fractal net evolution is used to optimize the disparity map. This paper verifies the effectiveness of the proposed method on a public US3D dataset and a self-made dataset, and compares it with state-of-the-art stereo matching methods.