A problem in quantum information theory is to find the experimental setup that maximizes the nonlocality of correlations with respect to some suitable measure such as the violation of Bell inequalities. The latter has however some drawbacks. First and foremost it is unfeasible to determine the whole set of Bell inequalities already for a few measurements and thus unfeasible to find the experimental setup maximizing their violation. Second, the Bell violation suffers from an ambiguity stemming from the choice of the normalization of the Bell coefficients. An alternative measure of nonlocality with a direct information-theoretic interpretation is the minimal amount of classical communication required for simulating nonlocal correlations. In the case of many instances simulated in parallel, the minimal communication cost per instance is called nonlocal capacity, and its computation can be reduced to a convex-optimization problem. This quantity can be computed for a higher number of measurements and turns out to be useful for finding the optimal experimental setup. Focusing on the bipartite case, in this paper, we present a simple method for maximizing the nonlocal capacity over a given configuration space and, in particular, over a set of possible measurements, yielding the corresponding optimal setup. Furthermore, we show that there is a functional relationship between Bell violation and nonlocal capacity. The method is illustrated with numerical tests and compared with the maximization of the violation of CGLMP-type Bell inequalities on the basis of entangled two-qubit as well as two-qutrit states. Remarkably, the anomaly of nonlocality displayed by qutrits turns out to be even stronger if the nonlocal capacity is employed as a measure of nonlocality.