Ensemble perception is an important ability of human beings that allows one to extract summary information for scenes and environments that contain information that far exceeds the processing limit of the visual system. Although attention has been shown to bias ensemble perception, two important questions remain unclear: (1) whether direct manipulations on different types of spatial attention could produce similar effects on ensembles and (2) whether factors potentially influencing the attention distribution, such as depth perception, could evoke an indirect effect of attention on ensemble representation. This study aims to address these questions. In Experiments 1 and 2, two types of precues were used to evoke exogenous and endogenous attention, respectively, and the ensemble color perceptions were examined. We found that both exogenous and endogenous attention biased ensemble representation towards the attended items, and the latter produced a greater effect. In Experiments 3 and 4, we examined whether depth perception could affect color ensembles by indirectly influencing attention allocation in 3D space. The items were separated in two depth planes, and no explicit cues were applied. The results showed that color ensemble was biased to closer items when depth information was task relevant. This suggests that ensemble perception is naturally biased in 3D space, probably through the mechanism of attention. Computational modeling consistently showed that attention exerted a direct shift on the ensemble statistics rather than averaging the feature values over the cued and noncued items, providing evidence against an averaging process of individual perception.