In the field of evolutionary multi-objective optimization (EMO), the standard practice is to present the final population of an EMO algorithm as the output. However, it has been shown that the final population often includes solutions which are dominated by other solutions generated and discarded in previous generations. Recently, a novel EMO framework has been developed to solve this issue by storing all the non-dominated solutions generated during the evolution in an archive, and selecting a subset of solutions from the archive as the output. The key component of this framework is the subset selection from the archive, which typically stores a large number of candidate solutions. However, most relevant studies have focused on small candidate solution sets for environmental selection. There is no benchmark test suite for large-scale subset selection. This study aims to fill this research gap by proposing a benchmark test suite for large-scale subset selection, and providing a comparison between several representative subset selection algorithms using the proposed test suite. The proposed test suite together with the benchmarking studies provides a baseline for researchers to understand, use, compare, and develop large-scale subset selection algorithms in the EMO field.