This research describes a vision-based control strategy that employs deep learning for an aerial manipulation system developed for vegetation sampling in remote, dangerous environments. Vegetation sampling in such places presents considerable technical challenges such as equipment failures and exposure to hazardous elements. Controlling aerial manipulation in unstructured areas such as forests remains a significant challenge because of uncertainty, complex dynamics, and the possibility of collisions. To overcome these issues, we offer a new image-based visual servoing (IBVS) method that uses knowledge distillation to provide robust, accurate, and adaptive control of the aerial vegetation sampler. A convolutional neural network (CNN) from a previous study is used to detect the grasp point, giving critical feedback for the visual servoing process. The suggested method improves the precision of visual servoing for sampling by using a learning-based approach to grip point selection and camera calibration error handling. Simulation results indicate the system can track and sample tree branches with minimum error, demonstrating that it has the potential to improve the safety and efficiency of aerial vegetation sampling.