Handheld scanning using commodity depth cameras provides a flexible and low-cost manner to get 3D models. The existing methods scan a target by densely fusing all the captured depth images, yet most frames are redundant. The jittering frames inevitably embedded in handheld scanning process will cause feature blurring on the reconstructed model and even trigger the scan failure (i.e., camera tracking losing). To address these problems, in this paper, we propose a novel sparse-sequence fusion (SSF) algorithm for handheld scanning using commodity depth cameras. It first extracts related measurements for analyzing camera motion. Then based on these measurements, we progressively construct a supporting subset for the captured depth image sequence to decrease the data redundancy and the interference from jittering frames. Since SSF will reveal the intrinsic heavy noise of the original depth images, our method introduces a refinement process to eliminate the raw noise and recover geometric features for the depth images selected into the supporting subset. We finally obtain the fused result by integrating the refined depth images into the truncated signed distance field (TSDF) of the target. Multiple comparison experiments are conducted and the results verify the feasibility and validity of SSF for handheld scanning with a commodity depth camera.