Multi-robot collaborative manufacturing (MRCMfg) is vital to ensure effective and efficient operation in the manufacturing industry. In recent years, enabled by the new generation of information technology, represented by the digital twin (DT), the digitalization of MRCMfg has progressed steadily. However, more complex manufacturing tasks and more customized manufacturing demands pose challenges to MRCMfg, which limit the applicability of MRCMfg in practice. Recent successes in intelligent manufacturing show that empowering DT with the intelligence and cognition capabilities to adapt to the above challenges is a promising approach. Inspired by that, a cognitive digital twin (CDT) framework with multiple artificial intelligence algorithms for MRCMfg is proposed. The framework consists of three critical spaces: (i) Physical-virtual space: multi-agent system-based DT; (ii) Cognition space: multi-behavior tree (BT) cognition models; (iii) Data space: data-driven cognition evolution. In the data space, multimodal perception is a important way to improve the cognition capabilities of MRCMfg. To alleviate the multimodal data quality requirements as well as to improve the robustness of the perception model, an incomplete multimodal Transformer with deep autoencoder (IMTDAE) approach is proposed. The effectiveness and feasibility of this framework and IMTDAE are verified by case studies. The results provide compelling evidence that this framework has the potential to significantly enhance the intelligence and cognition capabilities of MRCMfg.