Weakly supervised multi-class video segmentation is one of the most challenging yet least studied research problems in computer vision. This study aims to investigate two main items: (1) effective feature update for temporal changes combined with feature reuse between temporal frames; and (2) learn object patterns in complex scenes specifically for videos under weak supervision. Associating image tags to visual appearance is not a straightforward learning task, especially for complex scenes. Therefore, in this paper, we present manifold augmentations to obtain reliable pixel labels from image tags. We propose a framework comprised of two key modules: a temporal split module for efficient video processing and a pseudo per-pixel seed generation module for precise pixel-level supervision. Particularly, in our model, we utilize and explore temporal correlations via temporal split module and temporal attention. To reuse the extracted features and incorporate temporal updates for precise and fast computation, a channel-wise temporal split mechanism between successive video frames is presented. Furthermore, we evaluated proposed modules in two additional settings: (1) fully or sparsely supervised road scene video segmentation; and (2) weakly supervised segmentation for complex road scene images. Experiments are conducted on the Cityscapes and CamVid datasets, using DeepLabv3 as segmentation network and LiteFlowNet to compute motion vectors.