Turbulence and large computational domains make urban wind flow simulation a computationally challenging problem. To reduce the total simulation time and efficiently use today's new generation of supercomputers with massive processors, scalable parallel solvers are needed. In this work, we present a scalable domain decomposition method based 3D incompressible Navier-Stokes solver for the simulation of unsteady, complex wind flows around urban communities. A large eddy simulation with Smagorinsky modeling is employed for the simulation of the turbulent wind flows. On a fine unstructured grid, we discretize the spatial and temporal dimensions using a stabilized P1−P1 finite element method and an implicit backward Euler finite difference method, respectively. The resulting large-scale nonlinear algebraic system at each timestep is solved by a Newton-Krylov-Schwarz algorithm in parallel. The solver is first validated by a benchmark case where the numerical results are compared with measured data from a wind tunnel. Then, the wind flow field of a realistic actual-scale urban community with a group of buildings in the downtown area of Shenzhen, China, is studied. The numerical results show that the flow field matches with the experimental data for the benchmark problem, and some reasonable, detailed, and complex flow structures are obtained for the urban community simulation case. The scalability of the solver is studied on the TianHe-2A and Sunway TaihuLight supercomputers, and the solver scales up to 16,384 processor cores for a grid with over 20 million elements, which implies that the solver has the potential to perform fast and high-fidelity simulations of large-scale wind flows in complicated computational domains.