Computer-aided detection of cognitive impairment garnered increasing attention, offering older adults in the community access to more objective, ecologically valid, and convenient cognitive assessments using multimodal sensing technology on digital devices. In this study, we aimed to develop an automated method for screening cognitive impairment, building on paper- and electronic TMTs. We proposed a novel deep representation learning approach named Semi-Supervised Vector Quantised-Variational AutoEncoder (S2VQ-VAE). Within S2VQ-VAE, we incorporated intra- and inter-class correlation losses to disentangle class-related factors. These factors were then combined with various real-time obtainable features (including demographic, time-related, pressure-related, and jerk-related features) to create a robust feature engineering block. Finally, we identified the light gradient boosting machine as the optimal classifier. The experiments were conducted on a dataset collected from older adults in the community. The experimental results showed that the proposed multi-type feature fusion method outperformed the conventional method used in paper-based TMTs and the existing VAE-based feature extraction in terms of screening performance. In conclusion, the proposed deep representation learning method significantly enhances the cognitive diagnosis capabilities of behavior-based TMTs and streamlines large-scale community-based cognitive impairment screening while reducing the workload of professional healthcare staff.