Human activity recognition (HAR) has been a vital human–computer interaction service in smart homes. It is still a challenging task due to the diversity and similarity of human actions. In this paper, a novel hierarchical deep learning-based methodology equipped with low-cost sensors is proposed for high-accuracy device-free human activity recognition. ESP8266, as the sensing hardware, was utilized to deploy the WiFi sensor network and collect multi-dimensional received signal strength indicator (RSSI) records. The proposed learning model presents a coarse-to-fine hierarchical classification framework with two-level perception modules. In the coarse-level stage, twelve statistical features of time–frequency domains were extracted from the RSSI measurements filtered by a butterworth low-pass filter, and a support vector machine (SVM) model was employed to quickly recognize the basic human activities by classifying the signal statistical features. In the fine-level stage, the gated recurrent unit (GRU), a representative type of recurrent neural network (RNN), was applied to address issues of the confused recognition of similar activities. The GRU model can realize automatic multi-level feature extraction from the RSSI measurements and accurately discriminate the similar activities. The experimental results show that the proposed approach achieved recognition accuracies of 96.45% and 94.59% for six types of activities in two different environments and performed better compared the traditional pattern-based methods. The proposed hierarchical learning method provides a low-cost sensor-based HAR framework to enhance the recognition accuracy and modeling efficiency.