In this article, we present a work using a smartphone with an off-the-shelf WiFi router for human activity recognition with various scales. The router serves as a hotspot for transmitting WiFi packets. The smartphone is configured with customized firmware and developed software for capturing WiFi channel state information (CSI) data. We extract the features from the CSI data associated with specific human activities, and utilize the features to classify the activities using machine learning models. To evaluate the system performance, we test 20 types of human activities with different scales including seven small motions, four medium motions, and nine big motions. We recruit 60 participants and spend 140 hours for data collection at various experimental settings, and have 36 000 data points collected in total. Furthermore, for comparison, we adopt three distinct machine learning models, including convolutional neural networks (CNNs), decision tree, and long short-term memory. The results demonstrate that our system can predict these human activities with an overall accuracy of 97.25%. Specifically, our system achieves a mean accuracy of 97.57% for recognizing small-scale motions that are particularly useful for gesture recognition. We then consider the adaptability of the machine learning algorithms in classifying the motions, where CNN achieves the best predicting accuracy. As a result, our system enables human activity recognition in a more ubiquitous and mobile fashion that can potentially enhance a wide range of applications such as gesture control, sign language recognition, etc.