Aiming at the problem of confusion of recognition categories caused by insufficient feature information extraction and insufficient use of time correlation information between video frames in human behavior recognition, we propose a human behavior recognition algorithm based on time correlation sampling two-stream heterogeneous grafting network. In the video sampling, we adopt the time correlation sampling of video frames based on the TRN algorithm to remove redundant information in adjacent frames and make full use of the time correlation between different video stages. Based on their changes in the time dimension, many easily confused behaviors can be well distinguished. Due to the complexity and diversity of behavior feature information, we propose a two-stream heterogeneous network based on the basic structure of the traditional two-stream network. In order to improve the efficiency of network convolution kernel, filter weight grafting is carried out for invalid filters in DenseNet network and BNIception network based on filter grafting technology. It can effectively improve the network model's ability to express feature information. Then the improved DenseNet network and BNIception network are used to form a two-stream heterogeneous network to extract spatial and temporal information. The experimental results show that the accuracy of this method in UCF101 dataset and in KTH dataset are respectively 89.3% and 92.1%, which proves the effectiveness of the algorithm.