AbstractAs an essential field of multimedia and computer vision, 3D shape recognition has attracted much research attention in recent years. Multiview‐based approaches have demonstrated their superiority in generating effective 3D shape representations. Typical methods usually extract the multiview global features and aggregate them together to generate 3D shape descriptors. However, there exist two disadvantages: First, the mainstream methods ignore the comprehensive exploration of local information in each view. Second, many approaches roughly aggregate multiview features by adding or concatenating them together. The information loss for some discriminative characteristics limits the representation effectiveness. To address these problems, a novel architecture named region‐based joint attention network (RJAN) was proposed. Specifically, the authors first design a hierarchical local information exploration module for view descriptor extraction. The region‐to‐region and channel‐to‐channel relationships from different granularities can be comprehensively explored and utilised to provide more discriminative characteristics for view feature learning. Subsequently, a novel relation‐aware view aggregation module is designed to aggregate the multiview features for shape descriptor generation, considering the view‐to‐view relationships. Extensive experiments were conducted on three public databases: ModelNet40, ModelNet10, and ShapeNetCore55. RJAN achieves state‐of‐the‐art performance in the tasks of 3D shape classification and 3D shape retrieval, which demonstrates the effectiveness of RJAN. The code has been released on https://github.com/slurrpp/RJAN.