Interventional Cone-Beam CT (CBCT) offers 3D visualization of soft-tissue and vascular anatomy, enabling 3D guidance of abdominal interventions. However, its long acquisition time makes CBCT susceptible to patient motion. Image-based autofocus offers a suitable platform for compensation of deformable motion in CBCT, but it relies on handcrafted motion metrics based on first-order image properties and that lack awareness of the underlying anatomy. This work proposes a data-driven approach to motion quantification via a learned, context-aware, deformable metric, , that quantifies the amount of motion degradation as well as the realism of the structural anatomical content in the image. The proposed was modeled as a deep convolutional neural network (CNN) trained to recreate a reference-based structural similarity metric-visual information fidelity (VIF). The deep CNN acted on motion-corrupted images, providing an estimation of the spatial VIF map that would be obtained against a motion-free reference, capturing motion distortion, and anatomic plausibility. The deep CNN featured a multi-branch architecture with a high-resolution branch for estimation of voxel-wise VIF on a small volume of interest. A second contextual, low-resolution branch provided features associated to anatomical context for disentanglement of motion effects and anatomical appearance. The deep CNN was trained on paired motion-free and motion-corrupted data obtained with a high-fidelity forward projection model for a protocol involving 120kV and 9.90mGy. The performance of was evaluated via metrics of correlation with ground truth and with the underlying deformable motion field in simulated data with deformable motion fields with amplitude ranging from 5 to 20mm and frequency from 2.4 up to 4 cycles/scan. Robustness to variation in tissue contrast and noise levels was assessed in simulation studies with varying beam energy (90-120kV) and dose (1.19-39.59mGy). Further validation was obtained on experimental studies with a deformable phantom. Final validation was obtained via integration of on an autofocus compensation framework, applied to motion compensation on experimental datasets and evaluated via metric of spatial resolution on soft-tissue boundaries and sharpness of contrast-enhanced vascularity. The magnitude and spatial map of showed consistent and high correlation levels with the ground truth in both simulation and real data, yielding average normalized cross correlation (NCC) values of 0.95 and 0.88, respectively. Similarly, achieved good correlation values with the underlying motion field, with average NCC of 0.90. In experimental phantom studies, properly reflects the change in motion amplitudes and frequencies: voxel-wise averaging of the local across the full reconstructed volume yielded an average value of 0.69 for the case with mild motion (2mm, 12 cycles/scan) and 0.29 for the case with severe motion (12mm, 6 cycles/scan). Autofocus motion compensation using resulted in noticeable mitigation of motion artifacts and improved spatial resolution of soft tissue and high-contrast structures, resulting in reduction of edge spread function width of 8.78% and 9.20%, respectively. Motion compensation also increased the conspicuity of contrast-enhanced vascularity, reflected in an increase of 9.64% in vessel sharpness. The proposed , featuring a novel context-aware architecture, demonstrated its capacity as a reference-free surrogate of structural similarity to quantify motion-induced degradation of image quality and anatomical plausibility of image content. The validation studies showed robust performance across motion patterns, x-ray techniques, and anatomical instances. The proposed anatomy- and context-aware metric poses a powerful alternative to conventional motion estimation metrics, and a step forward for application of deep autofocus motion compensation for guidance in clinical interventional procedures.