Facial expressions express different emotions through different degrees of distortion of facial muscles, and the expression images spontaneously generated by people in real scenes are susceptible to interference from different illumination angles and postures. An extraction based on bilinear convolutional neural network is proposed. This is an end-to-end method for emotion classification based on second-order features of expression images. This method optimizes the network structure by adding a batch normalization layer, increasing the maximum pooling core, and replacing the fully connected layer with a global average pooling layer. Finally, the bilinear normalization method is used to convert one-dimensional features into two the dimensional features complete the final classification. Through testing on the FER2013 dataset, this method obtains an accuracy of 73.2%, which is 0.5% higher than the current state-of-the-art method, and proves that extracting image second-order features is more conducive to expression classification than first-order features. In addition, compared with other mainstream methods, the proposed improved model has higher recognition accuracy.