Sign language is the main type of communication of the deaf community. However, most people do not know this language, which causes communication problems for many people. Many technological solutions have been proposed to overcome this issue. Some use the concept of wearable devices, but gesture recognition in a video sequence is a cheaper and less intrusive solution. In this work, we address the problem of gesture recognition in video. To do so, we employ a two-step method with feature space mapping and classification. First, the body parts of each subject in a video are segmented through a deep neural network architecture. Then, we use Gait Energy Image to encode the motion of the body parts in a compact feature space. Small datasets are usually a problem in this type of application, leading to sparse representation in the feature space. To contour this problem, we evaluate SMOTE as a data augmentation technique in the feature space and classical dimensionality reduction techniques. We evaluate our method on three challenging Brazilian sign language (Libras) datasets, CEFET/RJ-Libras, MINDS-Libras, and LIBRAS-UFOP, achieving global accuracies of 85.40±3.13%, 84.66±1.78%, and 64.91±3.79%, respectively, with <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">singular value decomposition</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">support vector machine</i> .