As a primary optimization method for neural networks, gradient descent algorithm has received significant attention in the recent development of deep neural networks. However, current gradient descent algorithms still suffer from drawbacks such as an excess of hyperparameters, getting stuck in local optima, and poor generalization. This paper introduces a novel Caputo fractional-order gradient descent (MFFGD) algorithm to address these limitations. It provides fractional-order gradient derivation and error analysis for different activation functions and loss functions within the network, simplifying the computation of traditional fractional order gradients. Additionally, by introducing a memory factor to record past gradient variations, MFFGD achieves adaptive adjustment capabilities. Comparative experiments were conducted on multiple sets of datasets with different modalities, and the results, along with theoretical analysis, demonstrate the superiority of MFFGD over other optimizers.