Facial micro-expressions can reveal a person’s actual mental state and emotions. Therefore, it has crucial applications in many fields, such as lie detection, clinical medicine, and defense security. However, conventional methods have extracted features on designed facial regions to recognize micro-expressions, failing to effectively hit the micro-expression critical regions since micro-expressions are localized and asymmetric. Consequently, we propose the Haphazard Cuboids ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HC</i> ) feature extraction method, which generates target regions by haphazard sampling technique and then extracts micro-expression spatio-temporal features. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HC</i> consists of two modules: spatial patches generation ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SPG</i> ) and temporal segments generation ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">TSG</i> ). <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SPG</i> is assigned to generate localized facial regions, and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">TSG</i> is dedicated to generating temporal intervals. Through extensive experiments, we demonstrate the superiority of the proposed method. Afterward, we analyze two modules with conventional and deep-learning methods and find that they can significantly improve the performance of micro-expression recognition, respectively. Thereinto, we embed the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SPG</i> module into deep learning and experimentally demonstrate the effectiveness and superiority of our proposed sampling method in comparison with state-of-the-art methods. Furthermore, we analyze the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">TSG</i> module with the maximum overlapping interval ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MOI</i> ) method and find its coherence with the maximum interval of the apex frame distribution in CASME II and SAMM. Therefore, analogous to the human face’s region of interest (ROI), micro-expressions also inherit similar ROI in the temporal dimension, whose positions are highly relevant to the intensive moment, i.e., the apex frame.