In recent years, stance detection has become an important topic in the field of natural language processing. In earlier work, researchers have used feature engineering for stance detection but they need to define and extract appropriate features according to the particular application. This leads to poor generalization and a complex modeling process. Other researchers have applied deep learning methods. However, the popular convolutional neural network (CNN) method has the problem of information loss and a single-size CNN filter cannot accurately extract features that have different lengths from text, and so cannot deal with the diverse nature of features. In order to address these problems, we propose a two-channel CNN-GRU fusion network. First, a convolution layer with two filters with different window sizes is used to extract local features within the topic content and text content. Then, a gated recurrent unit (GRU) network is used to extract their timing characteristics. After that, the intermediate features are spliced and input to a classifier to complete the stance detection. Our method is validated using data from NLPCC 2016. The experimental results show that ACC and average F1 score of this method are 13.1% and 15.6% better than SVM method, 6.2% and 11.6% better than CNN method, 5.6% and 3.3% better than GRU method, and 1.1% and 2.2% better compared with hybrid model proposed by Nanyu, respectively, which is used as a baseline with no increase in run-time, and achieves the same accuracy with less run-time than another baseline of a semantic attention-based model proposed by Zhou. In addition, our method allows better classification than the single channel model. Finally, we find that the operation time of a multi-channel CNN-GRU increases gradually with increasing number of channels, but the classification accuracy does not improve, so a two-channel CNN-GRU is the most appropriate choice.