As an increasing number of deep-learning-based malware scanners have been proposed, the existing evasion techniques, including code obfuscation and polymorphic malware, are found to be less effective. In this work, we propose a reinforcement learning based semantics-preserving (i.e. functionality-preserving) attack against black-box GNNs (Graph Neural Networks) for malware detection. The key factor of adversarial malware generation via semantic <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Nops</monospace> insertion is to select the appropriate semantic <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Nops</monospace> and their corresponding basic blocks. The proposed attack uses reinforcement learning to automatically make these “how to select” decisions. To evaluate the attack, we have trained two kinds of GNNs with three types (e.g., Backdoor, Trojan, and Virus) of Windows malware samples and various benign Windows programs. The evaluation results have shown that the proposed attack can achieve a significantly higher evasion rate than four baseline attacks, namely the binary diversification attack, the semantics-preserving random instruction insertion attack, the semantics-preserving accumulative instruction insertion attack, and the semantics-preserving gradient-based instruction insertion attack.