Spectroscopy is essential to understand a series of phenomena in multiple fields of study. In remote sensing, vegetation analysis is one of the most prominent fields to explore, aiming to improve a specific task. As a task, modeling insect damage in the plants is essential to establish the correct management of agricultural farmlands. Hyperspectral data, which can be acquired with field spectroscopy at plant or leaf level, is a non-direct, rapid, and trustworthy approach to indicate its health. However, the spectral redundancy inherent is a challenge for the information extraction process, making the pre-processing phase an essential part of the analysis. Currently, artificial intelligence techniques, mostly based on machine and deep learning methods, are a standard application in data processing, being pre-processing techniques an essential part of it. But few studies aimed to measure the impact of such processes in vegetation monitoring, specifically with insect damage and spectral data. Here, we provide an analysis of the impact of pre-processing techniques on machine learning algorithms’ performance over said classification task. For this, we used a field spectroradiometer that operates within the 350–1,000 nm and 1,000–2,500 nm ranges. The dataset was composed of multiple spectral measurements that took place on different days in a controlled environment with soybean plants. As pre-processing techniques, methods like baseline removal, smoothing, first and second-order derivatives, standard normal variate (SNV), multiplicative scatter correction (MSC), and principal components analysis (PCA) were investigated. Several machine learning algorithms and one deep learning method were applied to model the datasets. The impact of the pre-processing techniques was measured within validation metrics relate to its accuracy. Our results indicated that the Extra-Tree (ExT) algorithm was better, mainly when first-order derivative data were extracted from the dataset (accuracy equal to 93.68%). A ranking approach indicated that the most contributive spectral region situates at the near-infrared, between 784 and 911 nm. Our investigation also demonstrates that a deep neural network (DNN) did not return a satisfactory result over raw reflectance data. However, when considering a combination of PCA over the 2nd derivative data, it achieved similar results to the ExT algorithm (accuracy of 91.95%). The implications of such, alongside the ranking approach, are discussed in this paper. We hope that the information presented here serves as a framework for future research when applying pre-processing techniques alongside the machine and deep learning methods over spectral data.