The microservices architecture is extensively utilized in cloud-based application development, characterized by the construction of applications through a series of functionally independent, small, autonomous services. This architectural approach is renowned for its attributes such as high cohesion, availability, low coupling, and exceptional scalability. The detection of runtime system point anomalies in microservices architectures is crucial for enhancing the Quality of Service(QoS). Furthermore, identifying the classes of detected anomalies is critical in practical applications. However, given the highly dynamic nature of microservices systems as a distributed computing architecture, conducting real-time system anomaly detection on distributed independent microservices poses a challenging task. To address these challenges, we propose the System Anomaly Detection and Multi-Classification based on Multi-Task Feature Fusion Federated Learning (SADMC-MT-FF-FL) framework. Initially, we introduce a distributed learning framework based on Multi-task Federated Learning (MT-FL) to construct multi-classification anomaly detection models for each microservice. Secondly, to identify complex system anomaly patterns and features during the runtime of microservices, we develop a feature extractor based on External Attention Mechanism and Multi-channel Residual Structure (EA-MRS). Finally, we design a Local–Global Feature-based Parallel Knowledge Transfer (LGF-PKT) framework, utilizing parallel knowledge transfer to parallelize weight updates for local and global features. To validate the effectiveness of our approach, we conducted comprehensive comparative experiments on the microservices benchmark platforms Sock-Shop and Train-Ticket. The experimental results on anomaly detection for multiclassification systems demonstrate that SADMC-MT-FF-FL outperforms the best baseline method by 28.3% and 27.8% for Macro F1 and Micro F1 on Train-Ticket, and by 8.8% and 8.6% on Sock-Shop, respectively. Additionally, we conducted comparison experiments on three public datasets, SWaT, SMD, and SKAB. The F1 scores were 0.5% higher than those of the centralized methods on SMD, respectively, 6% and 2.8% higher than those of the federated learning based method on SWaT and SKAB. Source codes are available at: https://github.com/icc-lab-xhu1/SADMC-MT-FF-FL.