Based on the diversified application scenarios at Ant Group, we built the Ant Knowledge Graph Platform (AKGP). It has constructed numerous domain-specific knowledge graphs related to merchants, companies, accounts, products, and more. AKGP manages trillions of structured knowledge graphs, serving search, recommendation, risk control and other businesses. However, as the demand increasing for various workloads such as graph pattern matching, graph representation learning, and cross-domain knowledge reuse, the existing warehouse systems based on relational DBMS or graph databases are unable to meet the requirements. To address these issues, we propose KGFabric, an industrial-scale knowledge graph management system built on the distributed file system (DFS). KGFabric offers a nearline knowledge storage engine that utilizes a Semantic-enhanced Programmable Graph (SPG) model, which is compatible with the Labeled Property Graph (LPG) model. The data is persistently stored in DFS, such as HDFS, which leverages the POSIX file system API, making it suitable for deployment in multi-cloud environment at low cost. KGFabric provides a native graph-based and hybrid storage format that can serve as a shared backend for parallel graph computing systems, significantly accelerating the analysis of multi-workload. Additionally, KGFabric includes a graph fabric framework that minimizes data duplication and guarantees data security. KGFabric is able to manage Peta-scale data and has supported graph fabric and analysis with over 100 billion relations at Ant Group. We conduct experiments on various datasets to evaluate the performance of KGFabric. Compared with popular relational DBMS and graph databases, the storage space for semantic relations is reduced by over 90%. The performance of graph fabric improves by 21× in real-world workloads. In multi-hop semantic graph analysis, KGFabric enhances performance by 100×.
Read full abstract