Self-supervised monocular depth estimation methods have achieved remarkable results on natural clear images. However, it is still a serious challenge to directly recover depth information from blurred images caused by long-time exposure while camera fast moving. To address this issue, we propose a unified framework for simultaneous deblurring and depth estimation (SDDE), which has higher coupling performance and flexibility compared with the simple concatenation strategy of deblurring model and depth estimation model. This framework mainly benefits from three features: (1) a novel Task-aware Fusion Module (TFM) to adaptively select the most relevant intermediate shared features for the dual decoder network by aggregating multi-scale features, (2) a unique Spatial Interaction Module (SIM) to learn higher-order representation in the encoder stage to better describe complex boundaries of different classes in high-dimensional space, and focuses on the task-related region by modeling the pairwise spatial correlation of the holistic tensor, (3) a Priors-Based Composite Regularization term to jointly optimize the shared encoder-dual decoder network. This work was evaluated on multiple datasets, including: Stereo blur [1], KITTI [2],NYUv2 [3], REDS [4] and our own large-scale stereo blur dataset, resulting in state-of-the-art results for depth estimation and image deblurring, respectively.