Active IPv6 address data could support research and applications for the next generation of the Internet, but finding ways to gather such data through active scanning and deal with emerging large-scale alias is a challenge. In this paper, we propose 6Tree, which analyzes known active addresses as seeds to learn their distribution feature and offers suggested search directions for scanners. It comprehends IPv6 addresses as high-dimensional vectors and performs a divisive hierarchical clustering (DHC) on corresponding vectors of seeds to generate a data structure, named a space tree, that characterizes value variabilities in different dimensions. Moreover, it can dynamically adjust proper directions based on real-time scanning results and embed alias detection into the search for the first time. Compared with the state-of-the-art method 6Gen, 6Tree has a faster linear time complexity to finish million-scale data training at a minute level for supporting a timely application, as well as better robustness for maintaining address discovery performance in context variations, including uneven seed sampling and workload division. According to aliased prefixes collected from a nascent research, 6Tree discovered approximately 4.69 million dealiased active addresses based on 2.74 million seeds, including 1.67 million aliased addresses, by scanning 0.3 billion addresses in one experiment. Additionally, 99.5% of detected aliased addresses are in the gathered aliased prefixes, and some undiscovered aliased prefixes were also found. We design the visualization technique Iris to visualize the address distribution based on discovery results and offer a novel perspective on the IPv6 deployment.
Read full abstract