Recent advances in sensor technology have introduced low-cost RGB video plus depth sensors, such as the Kinect, which enable simultaneous acquisition of color and depth images at video rates. This paper introduces a framework for representation of general dynamic scenes from video plus depth acquisition. A hybrid representation is proposed which combines the advantages of prior surfel graph surface segmentation and modeling work with the higher resolution surface reconstruction capability of volumetric fusion techniques. The contributions are: 1) extension of a prior piecewise surfel graph modeling approach for improved accuracy and completeness; 2) combination of this surfel graph modeling with a truncated signed distance function surface fusion to generate dense geometry; and 3) proposal of means for validation of the reconstructed a 4D scene model against the input data and efficient storage of any unmodeled regions via residual depth maps. The approach allows arbitrary dynamic scenes to be efficiently represented with a temporally consistent structure and enhanced levels of detail and completeness where possible, but gracefully falls back to raw measurements where no structure can be inferred. The representation is shown to facilitate creative manipulation of real scene data which would previously require more complex capture set-ups or manual processing.