The advent of depth sensors opens up new opportunities for human action recognition by providing depth information. The main purpose of this paper is to present an effective method for human action recognition from depth images. A multilevel frame select sampling (MFSS) method are proposed to generate three levels of temporal samples from the input depth sequences first. Then, the proposed motion and static mapping (MSM) method is used to obtain the representation of MFSS sequences. After that, this paper exploits the block-based LBP feature extraction approach to extract features information from the MSM. Finally, the fisher kernel representation is applied to aggregate the block features, which is then combined with the kernel-based extreme learning machine classifier. The developed framework is evaluated on three public datasets captured by depth cameras. The experimental results demonstrate the great performance compared with the existing approaches.