With macrocell base stations (MBSs) providing basic coverage for mobile users, the multiple tiers of cache-enabled small-cell base stations (SBSs) can opportunistically form user-centric clusters to enhance network capacity through traffic offloading and cooperative transmission. In this paper, we investigate cooperative transmission in cache-enabled heterogeneous networks considering the impact of base station (BS) heights. Specifically, the user-centric cooperative SBS clusters are formed based on the information of the cached contents, the transmission distance, and the cell load at SBSs. The users failed to be offloaded to the cooperative SBS clusters are served by the nearest MBSs. By incorporating the COST 231 Hata model for the line-of-sight and nonline-of-sight channels, the explicit expressions for the average spectral efficiency (SE) are obtained with statistical characterizations for the cell load distribution, as well as the aggregated information and interference signal strength. The analytical results indicate that with the COST 231 Hata model and the cooperative SBS cluster, the average SE decreases with the increase of BS height. Moreover, different tradeoffs exist with the varying cache size, SBS density, and cooperative distance threshold, which results in a bell-shaped SE with respect to (w.r.t) the SBS density and the cooperative distance threshold. In addition, with an appropriate cooperative distance threshold, the average SE exhibits a bell-shaped relationship w.r.t the cache size. Extensive simulations are conducted to validate the analytical results and demonstrate the impact of the network parameters.