Data assimilation (DA) is a powerful technique for improving the forecast accuracy of dynamic systems by optimally integrating model forecasts with observations. Traditional DA approaches, however, encounter significant challenges when applied to complex, large-scale, highly nonlinear systems with sparse and noisy observations. To overcome these challenges, this study presents a new Neural Network-based Data Assimilation (DANet) model, specifically employing a Convolutional Long Short-Term Memory architecture. By leveraging the strengths of neural networks, DANet establishes the relationship among model forecasts, observations, and ground truth, facilitating efficient DA in large-scale spatiotemporal forecasting with sparse observations. The effectiveness of the DANet model is demonstrated through an initial case study of wind-driven oceanic flow forecasting, as described by a Quasi-Geostrophic (QG) model. Compared to the traditional Ensemble Kalman Filter (EnKF), DANet exhibits superior performance in cases involving both structured and unstructured sparse observations. This is evidenced by reduced Root Mean Square Errors (RMSEs) and improved correlation coefficients (R) and Structural Similarity Index. Moreover, DANet is seamlessly integrated with the QG model to operationally forecast vorticity and stream function in the long term, further confirming the accuracy and reliability of the DANet model. DANet achieves operational forecasting 60 times faster than EnKF, underscoring its efficiency and potential in DA advancement.