The optimization algorithm plays a decisive role in the performance of deep learning models. Nevertheless, the task, the data properties, and the model's architecture directly influence this latter's effectiveness. Hence, analyzing the effect of different optimizers on specific tasks, such as change detection, is very advantageous due to the ambiguity surrounding the method selection. This paper is twofold. First, we investigate the performance of five first-order optimization methods, namely Momentum GD, NAG, AdaGrad, RMSProp, and Adam, in the context of change detection using a U-Net and DenseU-Net architectures for remote sensing images. To the best of our knowledge, no comparative study involving optimization methods has been conducted for change detection. The results reveal that the Adam optimizer is the preferred choice to train a U-Net like architecture for this specific task. Challenges like dataset size and irregular change patterns warrant further exploration and optimization. The second part of this study focuses on a detailed analysis of the Adam optimizer. We specifically examine its sensitivity to both model and data sparsity. To do this, we train several Convolutional Neural Network (CNN) model architectures, each with a different degree of sparsity, on several datasets with various levels of sparsity. The experimental results provide valuable insights, showing that the Adam optimizer is more influenced by model sparsity than data sparsity, and suggesting its suitability for effectively training densely connected models.