It is essential for implementing processing systems of edge computing, internet of things (IoT) and wireless multimedia sensor networks (WMSN) to use low-power parallel and distributed architectures with high speed and low power consumption. Most of the artificial intelligence and machine learning applications need to be executed on efficient distributed architectures with multicast support. In this paper, TLAM, a high-performance and low-cost NoC architecture is proposed. TLAM stands for Two-Level network-on-chip Architecture with Multicast support and includes a hybrid path/tree based multicast routing algorithm and a specific two-level mesh topology. The routing algorithm of TLAM is basically working according to the Hamiltonian paths. In the proposed architecture, the topology is partitioned and the two-level links provide an efficient infrastructure for low-cost and low-latency transmission of multicast and broadcast messages between partitions. In TLAM, in addition to multicasting routing algorithm, hardware components for handling multicasting packets are designed in order to achieve performance improvements. The goal is to improve the efficiency and performance, especially latency, while handling both unicast and multicast packets. Experimental evaluations including network-level and router-level analysis with different configurations under various traffic patterns were performed. The evaluations in terms of latency, throughput, area and power consumption, indicate that TLAM provides higher performance, especially for dense multicasting and broadcasting, in comparison with the existing architectures.