Steganalysis is an important and challenging problem in the area of multimedia forensics. Many deeper networks have been put forward to improve the performance of detecting steganographic traits. Existing methods focus on leveraging a more deeper structure. However, as the model deepens, gradient backpropagation cannot guarantee the ability to flow through the weights of every module so that it is difficult to learn, in addition, the depth of the structure will consume the computing resources on GPU. To reduce the computation and accelerate the training process, we propose a novel architecture which combines batch normalization with shallow layers. To reduce the loss of tiny information in steganalysis, we decrease the depth and increase the width of networks and abandon the max-pooling layers. To tackle the problem in which the training process is too long under different payloads, we propose two transfer learning schemes including parameters multiplexing and fine tuning to improve the overall efficiency. We demonstrate the effectiveness of our method on two steganographic algorithms WOW and S-UNIWARD. Compared with SRM and Ye.net, our model achieves better performance on the BOSSbase database and enhances the efficiency.