Precise prediction of agricultural production output is crucial for farmers, policymakers, and the Farming-related industry. This article introduces a novel methodology to crop yield forecasting using a capsular neural network equipped with Conv-LSTM and attention mechanism. Our model combines the strengths of 3DCNN, and Conv-LSTM, which can capture the temporal dependencies and 3D features of crop yield data, and attention mechanism, which Can prioritize the most significant characteristics for making predictions. We evaluated CACN on a sizable collection of data of soybean crop yield in the United States from 2003 to 2019 and evaluated against various cutting-edge deep learning models. The outcomes indicate that our suggested approach surpasses other models in performance in terms of RMSE, correlation coefficient, and prediction error map. Specifically, our model achieved approximately 14 % improvement in terms of RMSE, compared to the state-of-the-art model Deep-Yield. Our model also demonstrated the ability to extract more meaningful features and capture the complex relationships between crop yield data and meteorological variables. Overall, our proposed method shows great potential for accurate and efficient crop yield forecasting and can be applied to other crops and regions.