Decision support systems for surveillance rely more and more on face recognition (FR) to detect target individuals of interest captured with video cameras. FR is a challenging problem in video surveillance due to variations in capture conditions, to camera interoperability, and to the limited representativeness of target facial models used for matching. Although adaptive classifier ensembles have been applied for robust face matching, it is often assumed that the proportions of faces captured for target and non-target individuals are balanced, known a priori, and do not change over time. Recently, some techniques have been proposed to adapt the fusion function of an ensemble according to class imbalance of the input data stream. For instance, Skew-Sensitive Boolean combination (SSBC) is a active approach that estimates target vs. non-target proportions periodically during operations using Hellinger distance, and adapts its ensemble fusion function to operational class imbalance. Beyond the challenges of estimating class imbalance, such techniques commonly generate diverse pools of classifiers by selecting balanced training data, limiting the potential diversity produced using the abundant non-target data. In this paper, adaptive skew-sensitive ensembles are proposed to combine classifiers trained by selecting data with varying levels of imbalance and complexity, to sustain a high level the performance for video-to-video FR. Faces captured for each person in the scene are tracked and regrouped into trajectories. During enrollment, captures in a reference trajectory are combined with selected non-target captures to generate a pool of 2-class classifiers using data with various levels of imbalance and complexity. During operations, the level of imbalance is periodically estimated from the input trajectories using the HDx quantification method, and pre-computed histogram representations of imbalanced data distributions. This approach allows one to adapt pre-computed histograms and ensemble fusion functions based on the imbalance and complexity of operational data. Finally, the ensemble scores are accumulated of trajectories for robust spatio-temporal recognition. Results on synthetic data show that adapting the fusion function of ensemble trained with different complexities and levels of imbalance can significantly improve performance. Results on the Face in Action video data show that the proposed method can outperform reference techniques (including SSBC and meta-classification) in imbalanced video surveillance environments. Transaction-based analysis shows that performance is consistently higher across operational imbalances. Individual-specific analysis indicates that goat- and lamb-like individuals can benefit the most from adaptation to the operational imbalance. Finally, trajectory-based analysis shows that a video-to-video FR system based on the proposed approach can maintain, and even improve overall system discrimination.