Light field videos captured in RGB frames (RGB-LFV) can provide users with a 6 degree-of-freedom immersive video experience by capturing dense multi-subview video. Despite its potential benefits, the processing of dense multi-subview video is extremely resource-intensive, which currently limits the frame rate of RGB-LFV (i.e., lower than 30 fps) and results in blurred frames when capturing fast motion. To address this issue, we propose leveraging event cameras, which provide high temporal resolution for capturing fast motion. However, the cost of current event camera models makes it prohibitive to use multiple event cameras for RGB-LFV platforms. Therefore, we propose EV-LFV, an event synthesis framework that generates full multi-subview event-based RGB-LFV with only one event camera and multiple traditional RGB cameras. EV-LFV utilizes spatial-angular convolution, ConvLSTM, and Transformer to model RGB-LFV's angular features, temporal features, and long-range dependency, respectively, to effectively synthesize event streams for RGB-LFV. To train EV-LFV, we construct the first event-to-LFV dataset consisting of 200 RGB-LFV sequences with ground-truth event streams. Experimental results demonstrate that EV-LFV outperforms state-of-the-art event synthesis methods for generating event-based RGB-LFV, effectively alleviating motion blur in the reconstructed RGB-LFV.
Read full abstract