Combinatorial optimization problems are prevalent in many different fields. Most of these problems are NP-hard and challenging for computers with conventional Von-Neumann architecture. Ising machines with a number of spins have the potential to solve these problems by emulating the natural annealing process of solid matter. Recent research has explored some hardware implementation methods of Ising machines to accelerate the convergence process of such problems at room temperature. However, most of them are suffering from low scalability and low parallel processing capability due to the huge hardware cost and high complexity. In this paper, a novel network-on-chip-based annealing processing architecture (NoCAPA) for a large-scale Ising processor is described to address these issues with a NoC computing paradigm, a distributed storage scheme, and a fully pipelined structure design. Several techniques are developed to further increase convergence speed and reduce hardware resource consumption, including a dynamic multithread parallel update algorithm, a router with merge and deflection abilities, and a unique multiply-accumulate operation. The prototype is implemented in FPGA with the maximum operation frequency of 200MHz, achieving up to 120.5 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> faster than conventional simulated annealing method when solving the max-cut problem while supporting high scalability.