Dataflow architecture has native advantages in achieving high instruction parallelism and power efficiency for today’s emerging applications such as high performance computing and deep neural network. In dataflow computing, the execution of instructions is driven by data, so the data transfer efficiency of the network on chip (NoC) is a key factor affecting performance. However, the NoC performance degrades due to the increasing use of multicast communications in many applications. The existing dataflow architecture instruction scheduling algorithms do not optimize multicast communication between the instruction and its successor instructions, so the routing paths of many multicast packets have forks which cause bandwidth waste and potential network congestion. We propose a sharing path awareness (SPA) algorithm to optimize multicast communication in the dataflow architecture. The algorithm shares the routing paths from the instruction to its child node to reduce the NoC bandwidth waste through the instruction scheduler. For applications using software iteration, we further extend the loop optimization to the SPA algorithm to sufficiently exploit instruction-level parallelism. Compared with the state-of-the-art algorithm, we show that the SPA algorithm achieves 20.21% average performance improvement and 15.11% energy consumption reduction for our experimental workloads.