Threshold voltage (Vth\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$V_{th}$$\\end{document}) assignment is convenient for leakage optimization due to the exponential relation between leakage power and Vth\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$V_{th}$$\\end{document} by swapping logic cells without routing effort. However, it poses great challenge in large scale circuit design as an NP-hard problem. Machine learning-based approaches have been proposed to solve this problem, aiming to achieve well tradeoff between leakage power reduction and runtime speed up without new induced timing violation. In this paper, a leakage power optimization framework based on reinforcement learning (RL) with graph neural network (GNN) is first-ever proposed to formulate Vth\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$V_{th}$$\\end{document} assignment as a RL process by learning timing and physical characteristics of each circuit instance with GNN. Multiple instances are selected in a non-overlapped manner for each RL action iteration to speed up convergence and decouple timing interdependence along circuit path, where the corresponding reward is carefully defined to tradeoff between leakage reduction and potential timing violation. The proposed framework was validated by the Opencores and IWLS 2005 benchmark circuits with TSMC 28 nm technology. Experimental results demonstrate that our work outperforms prior non-analytical and GNN-based methods with better leakage power optimization by additional 5% to 17% reduction, which is highly consistent with the commercial tool. When transferring the trained RL-based framework to unseen circuits, it achieves the roughly identical leakage optimization results as seen circuit and speed up the runtime by 5.7×\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\ imes$$\\end{document} to 8.5×\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\ imes$$\\end{document} compared with commercial tool.