Recently, convolutional neural networks (CNNs) and vision transformers (ViTs) have emerged as powerful tools for image restoration (IR). Nonetheless, they encountered some limitations due to their characteristics, such as CNNs sacrificing global reception and ViTs requiring large memory and graphics resources. To address these limitations and explore an alternative approach for improved IR performance, we propose two clustering-based frameworks for general IR tasks, which are style-guided context cluster U-Net (SCoC-UNet) and style-guided clustered point interaction U-Net (SCPI-UNet). The SCoC-UNet adopts a U-shaped architecture, comprising position embedding, Encoder, Decoder, and reconstruction block. Specifically, the input low-quality image is viewed as a set of unorganized points, each of which is first given location information by the continuous relative position embedding method. These points are then fed into a symmetric Encoder and Decoder which utilize style-guided context cluster (SCoC) blocks to extract potential context features and high-frequency information. Although SCoC-UNet has obtained decent performance for image restoration, its SCoC block can only capture connectivity at points within the same cluster, which may ignore long-range dependencies in different clusters. To address this issue, we further propose a SCPI-UNet based on SCoC-UNet, which leverages a style-guided clustered point interaction (SCPI) block in place of the SCoC block. The SCPI block utilizes a cross-attention mechanism to establish the connections of feature points between different clusters. Extensive experimental results demonstrate that the proposed SCoC-UNet and SCPI-UNet can handle several typical IR tasks (i.e., JPEG compression artifact reduction, image denoising, and super-resolution) and achieve superior quantitative and qualitative performance over some state-of-the-art methods.