Accurate displacement measurement provides crucial information for evaluating the structural health condition. It is important to note that both direct and indirect displacement measurement methods have their own limitations, which can be remedied by leveraging each other’s strengths. In this study, a novel accelerometer-assisted computer vision data fusion framework is developed for accurately reconstructing structural dynamic displacement. The framework does not require a specific mathematical model or prior knowledge about the data’s characteristics, allowing it to be applied more broadly across different applications without the constraints of model assumptions. The core of this framework is to integrate successive variational mode decomposition (SVMD) and Bayesian optimization approaches to adaptively determine the weight factors of the fusion components. Importantly, an enhanced optical flow approach is presented for converting pixel movement to structural displacement from natural targets. This approach can effectively reduce the selection of mis-matched points within the ROI (Region of Interest), thereby reducing drift errors. The developed framework is verified via shaking table tests of a reinforced concrete frame structure under seismic excitation. Results indicate that the developed framework excels in accurately estimating structural dynamic displacement. Compared to single-vision displacement identification results, the proposed framework demonstrates lower peak error (< 1.65 mm) and normalized root mean square error (< 0.30). Meanwhile, the reconstructed displacement, by introducing dynamic displacement component from acceleration measurement, presents a wider frequency range than the vision-based displacement.