Forecasting stock market via historical financial data is an important issue for market participants because even if the prediction accuracy is only slightly improved, better trading decisions can be made. Historical financial data has evolved from the initial single text or stock price to the fusion of multisource information. However, how to adopt a method that adaptively fuses numerical data and text so that the prediction model can learn time series information in parallel remains a challenging problem. In this paper, we propose a collaborative attention Transformer fusion model for stock movement prediction (CoATSMP), including parallel extraction of text and prices features, parameter-level fusion and a joint feature processing module, that can successfully deeply fuse text and stock prices in view of the soft fusion method. The experiments show that (1) the proposed approach outperforms the baselines, (2) the soft fusion method proposed in this paper has better modeling performance under the CoATSMP framework, which brings greater improvement in the prediction performance, (3) models containing prices and text are better than those using only one data source, and (4) quantitative analysis of experimental results indicates that text plays a relatively more critical role in the CoATSMP framework. Real simulation trading shows that the trading strategy based on CoATSMP can significantly improve profits; thus, the model has practical application value.