While supervised neural networks have become state of the art for identifying the rare strong gravitational lenses from large imaging data sets, their selection remains significantly affected by the large number and diversity of non-lens contaminants. This work evaluates and compares systematically the performance of neural networks in order to move towards a rapid selection of galaxy-scale strong lenses with minimal human input in the era of deep, wide-scale surveys. We used multiband images from PDR2 of the Hyper-Suprime Cam (HSC) Wide survey to build test sets mimicking an actual classification experiment, with 189 securely-identified strong lenses from the literature over the HSC footprint and 70\,910 non-lens galaxies in COSMOS covering representative lens-like morphologies. Multiple networks were trained on different sets of realistic strong-lens simulations and non-lens galaxies, with various architectures and data preprocessing, mainly using the deepest $gri$-bands. Most networks reached excellent area under the Receiver Operating Characteristic (ROC) curves on the test set of 71,099 objects, and we determined the ingredients to optimize the true positive rate for a total number of false positives equal to zero or 10 $ and TPR$_ $). The overall performances strongly depend on the construction of the ground-truth training data and they typically, but not systematically, improve using our baseline residual network architecture presented in HOLISMOKES VI. TPR$_ $ tends to be higher for ResNets (simeq 10--40<!PCT!>) compared to AlexNet-like networks or G-CNNs. Improvements are found when (1) applying random shifts to the image centroids, (2) using square-root scaled images to enhance faint arcs, (3) adding $z$-band to the otherwise used $gri$-bands, or (4) using random viewpoints of the original images. In contrast, we find no improvement when adding $g - i$ difference images (where alpha is a tuned constant) to subtract emission from the central galaxy. The most significant gain is obtained with committees of networks trained on different data sets, with a moderate overlap between populations of false positives. Nearly-perfect invariance to image quality can be achieved by using realistic PSF models in our lens simulation pipeline, and by training networks either with large number of bands, or jointly with the PSF and science frames. Overall, we show the possibility to reach a $ as high as 60<!PCT!> for the test sets under consideration, which opens promising perspectives for pure selection of strong lenses without human input using the Rubin Observatory and other forthcoming ground-based surveys.