Abstract

Active learning for graph neural networks (GNNs) aims to select B nodes to label for the best possible GNN performance. Carefully selected labeled nodes can help improve GNN performance and hence motivates a line of research works. Unfortunately, existing methods still provide inferior GNN performance or cannot scale to large networks.Motivated by these limitations, in this paper, we present FICOM, an effective and scalable GNN active learning framework. Firstly, we formulate the node selection as an optimization problem where we consider the importance of a node from (i) the importance of a node during the feature propagation with a connection to the personalized PageRank (PPR), and (ii) the diversity of a node brings in the embedding space generated by feature propagation. We show that the defined problem is submodular, and a greedy solution can provide a (1-1/e)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$(1-1/e)$$\\end{document}-approximate solution.However, a standard greedy solution requires getting the node with the maximum marginal gain of the objective score in each iteration, which incurs a prohibitive running cost and cannot scale to large datasets. As our main contribution, we present FICOM, an efficient and scalable solution that provides (1-1/e)\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$(1-1/e)$$\\end{document}-approximation guarantee and scales to graphs with millions of nodes on a single machine. The main idea is that we adaptively maintain the lower- and upper-bound of the marginal gain for each node v. In each iteration, we can first derive a small subset of candidate nodes and then compute the exact score for this subset of candidate nodes so that we can find the node with the maximum marginal gain efficiently. Extensive experiments on six benchmark datasets using four GNNs, including GCN, SGC, APPNP, and GCNII, show that our FICOM consistently outperforms existing active learning approaches on semi-supervised node classification tasks using different GNNs. Moreover, our solution can finish within 5 h on a million-node graph.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.