The Rapid Serial Visual Presentation (RSVP) paradigm facilitates target identification in a rapid picture stream, which is applied extensively in military target surveillance and police monitoring. Most researchers concentrate on the single target RSVP-BCI whereas the study of dual-target is scarcely conducted, limiting RSVP application considerably. This paper proposed a novel classification model named Common Representation Extraction-Targeted Stacked Convolutional Autoencoder (CRE-TSCAE) to detect two targets with one nontarget in RSVP tasks. CRE generated a common representation for each target class to reduce variability from different trials of the same class and distinguish the difference between two targets better. TSCAE aimed to control uncertainty in the training process while requiring less target training data. The model learned a compact and discriminative feature through the training from several learning tasks so as to distinguish each class effectively. It was validated on the World Robot Contest 2021 and 2022 ERP datasets. Experimental results showed that CRE-TSCAE outperformed the state-of-the-art RSVP decoding algorithms and the Average ACC was 71.25%, improving 6.5% at least over the rest. It demonstrated that CRE-TSCAE showed a strong ability to extract discriminative latent features in detecting the differences among two targets with nontarget, which guaranteed increased classification accuracy. CRE-TSCAE provided an innovative and effective classification model for dual-target RSVP-BCI tasks and some insights into the neurophysiological distinction between different targets.