As an extension of bullying on social networks, cyberbullying has seriously affected the security of the online social environment and infringed on mental health. Taking appropriate measures to detect online bullying reviews is crucial. Existing studies usually classify the whole content as cyberbullying and non-cyberbullying. However, they do not fully exploit the interaction of multi-dimensional features and precisely distinguish the types of cyberbullying. To automatically extract features of bullying words and further identify fine-grained types of cyberbullying better, we propose a fusion capsule network with congruent attention for cyberbullying detection. In the proposed algorithm, a novel similarity weighting scheme based on word2vec is designed to soft highlight bullying features in word embeddings. Meanwhile, to leverage the respective advantages of extracted multiple subspace features, we construct a novel extensible congruent attention to balance the fusion of complex correlations between different subspace representations and retain the independence of context features. The fused features are updated iteratively with dynamic routing to aggregate and generate fine-grained category capsules for cyberbullying prediction. A series of experiments on the tweets cyberbullying benchmark demonstrate that our architecture matches or exceeds the performance of the compared baseline models and the results of extensive experiments prove the effectiveness of different strategies.