The development of emerging information technologies, such as the Internet of Things (IoT), edge computing, and blockchain, has triggered a significant increase in IoT application services and data volume. Ensuring satisfactory service quality for diverse IoT application services based on limited network resources has become an urgent issue. Generalized processor sharing (GPS), functioning as a central resource scheduling mechanism guiding differentiated services, stands as a key technology for implementing on-demand resource allocation. The performance prediction of GPS is a crucial step that aims to capture the actual allocated resources using various queue metrics. Some methods (mainly analytical methods) have attempted to establish upper and lower bounds or approximate solutions. Recently, artificial intelligence (AI) methods, such as deep learning, have been designed to assess performance under self-similar traffic. However, the proposed methods in the literature have been developed for specific traffic scenarios with predefined constraints, thus limiting their real-world applicability. Furthermore, the absence of a benchmark in the literature leads to an unfair performance prediction comparison. To address the drawbacks in the literature, an AI-enabled performance benchmark with comprehensive traffic-oriented experiments showcasing the performance of existing methods is presented. Specifically, three types of methods are employed: traditional approximate analytical methods, traditional machine learning-based methods, and deep learning-based methods. Following that, various traffic flows with different settings are collected, and intricate experimental analyses at both the feature and method levels under different traffic conditions are conducted. Finally, insights from the experimental analysis that may be beneficial for the future performance prediction of GPS are derived.