We present an approach to extract the salient object automatically in videos. Given an unannotated video sequence, the proposed method first computes the visual saliency to identify object-like regions in each frame based on the proposed weighted multiple manifold ranking algorithm. We then compute motion cues to estimate the motion saliency and localization prior. Finally, adopting a new energy function, we estimate a superpixel-level object labeling across all frames, where 1) the data term depends on the visual saliency and localization prior, and 2) the smoothness term depends on the constraints in time and space. Compared to the existing counterparts, the proposed approach automatically segments the persistent foreground object meanwhile preserving the potential shape. Experiments show its promising results on the challenging benchmark videos in comparison with the existing counterparts.