We propose a novel spatiotemporal graphical model for unsupervised video object segmentation. The core of our model is a layered-CRF (conditional random field) that contains two layers, i.e., pixel layer and supervoxel layer. First, the heat diffusion based segmentation and salient region detection is integrated to obtain the segmentation results of the first frame. The results are used as input seeds to train dual probabilistic models of each object class. In the spatiotemporal layered-CRF framework we extend binary segmentation to multiple object segmentation. We add intra-frame spatial matching potential and inter-frame temporal supervoxels consistent potential to link the pixel layer and the supervoxel layer. This improves the spatiotemporal smoothing throughout the video sequence in the proposed model. The proposed unsupervised method lightens the burden of labeling training samples and obtains a smooth and accurate object boundary in video segmentation. The experiments on two public datasets demonstrate that our method outperforms several state-of-the-art methods in both single and multiple foreground cases.