Phishing is one of the significant threats to cybersecurity today, especially when attackers create Internationalized Domain Names (IDN) homographs to engage in phishing activities. IDN homograph takes advantage of some characters in different native languages in internationalized domain names that look similar to legitimate ones. Although researchers have proposed several enlightening detection methods, most of them focused on detecting typosquatting domain names. The ones focused on IDN homograph attack detection either need to enhance the generalization ability or improve detection performance caused by data imbalance. In this paper, we devised a Generative Adversarial Network with a Gradient Penalty (WGAN-GP) algorithm to solve the data imbalance problem. We transform domain names into images and calculate their similarity by Siamese neural networks. Our work can identify whether a domain name is IDN homograph or not effectively. We use the dataset generated based on Unicode tables, publicly available homograph tools, and the Internet traffic captured from the China Education Research Network backbone (CERNET) to evaluate the performance. Experimental results show that the proposed method improves the accuracy and reduces the false positive rate in detecting homograph domain names. In addition, it can also accurately identify typosquatting in phishing pages.
Read full abstract