The ultra-high-definition 16K Virtual Reality (VR) video is coming to ages with more ”real” virtual experience and less cybersickness. However, the huge bitrate and decoding overhead would overwhelm today’s network and mobile hardware. The widely-known Field-of-View (FoV) adaptation streaming method still has severe bitrate wastes and decoding overhead as it delivers FoV areas with grid-like static tiles. Inspired by this, we present a novel ShiftTile-traCking (STC) streaming scheme, which crops and delivers tiles by tracking FoV movement. It is equivalent to deliver an FoV planar video instead of VR videos. This would save huge bit-rate and reduce decoding complexity. We mainly entail three contributions. 1) To reduce projection distortions, a novel FoV-centric sphere projection is proposed, which projects VR videos with the center of users’ FoVs. 2) To cover diverse FoV movement trajectories with a limited number of tiles, we propose an optimal tiling algorithm by trajectory clustering. 3) To be resilient to FoV prediction errors, we propose an accuracy-sensitive streaming algorithm, which scales FoV areas by the prediction accuracy. The evaluation shows that under the same network conditions, STC improves up to 1.3dB V-PSNR, reduces up to 13.2% buffering ratio, and achieves 60% faster decoding speed (61.5 frames per second) compared with state-of-the-art solutions.