Motivated by reflection-in-action in architectural design, this article introduces a spatial synthesis artifact that relies on multi-agent reinforcement learning to address spatial goals with fine-grained control in a simulation. It relies on parameter sharing with proximal policy optimization and a parameterized reward function to train robust agent policies in random environments with random spatial problems. The agents are evaluated in three design cases: a house design with 12 agents in three sites, a museum with 18 agents in an interstitial urban site, and a speculative design of a housing complex with 96 agents on a large empty site. The policies performed well in all the cases and produced morphologically consistent solutions. However, in cases with a larger number of agents, the system largely benefited from a spring layout algorithm for the initialization. Future research will address more complex spatial synthesis problems and mechanisms for human-computer interaction.