Estimating lighting from standard images can effectively circumvent the need for resource-intensive high-dynamic-range (HDR) lighting acquisition. However, this task is often ill-posed and challenging, particularly for indoor scenes, due to the intricacy and ambiguity inherent in various indoor illumination sources. We propose an innovative transformer-based method called SGformer for lighting estimation through modeling spherical Gaussian (SG) distributions—a compact yet expressive lighting representation. Diverging from previous approaches, we explore underlying local and global dependencies in lighting features, which are crucial for reliable lighting estimation. Additionally, we investigate the structural relationships spanning various resolutions of SG distributions, ranging from sparse to dense, aiming to enhance structural consistency and curtail potential stochastic noise stemming from independent SG component regressions. By harnessing the synergy of local-global lighting representation learning and incorporating consistency constraints from various SG resolutions, the proposed method yields more accurate lighting estimates, allowing for more realistic lighting effects in object relighting and composition. Our code and model implementing our work can be found at https://github.com/junhong-jennifer-zhao/SGformer.