A shared-memory counter is a widely-used and well-studied concurrent object. It supports two operations: An Inc operation that increases its value by 1 and a Read operation that returns its current value. In Jayanti et al (SIAM J Comput, 30(2), 2000), Jayanti, Tan and Toueg proved a linear lower bound on the worst-case step complexity of obstruction-free implementations, from read-write registers, of a large class of shared objects that includes counters. The lower bound leaves open the question of finding counter implementations with sub-linear amortized step complexity. In this work, we address this gap. We show that n-process, wait-free and linearizable counters can be implemented from read-write registers with \(O(\log ^2 n)\) amortized step complexity. This is the first counter algorithm from read-write registers that provides sub-linear amortized step complexity in executions of arbitrary length. Since a logarithmic lower bound on the amortized step complexity of obstruction-free counter implementations exists, our upper bound is within a logarithmic factor of the optimal. The worst-case step complexity of the construction remains linear, which is optimal. This is obtained thanks to a new max register construction with \(O(\log n)\) amortized step complexity in executions of arbitrary length in which the value stored in the register does not grow too quickly. We then leverage an existing counter algorithm by Aspnes, Attiya and Censor-Hillel [1] in which we “plug” our max register implementation to show that it remains linearizable while achieving \(O(\log ^2 n)\) amortized step complexity.