With CMOS process technology scaling, the mask cost for fabricating nano-scale transistors, contacts, and interconnects has become prohibitively expensive, especially, for low volume designs. Moreover, higher transistor density has resulted in higher design complexity and large-sized die, which has led to an increase in the design cycle time and degradation in the process yield. These challenges are forcing low-volume application-specific integrated circuits (ASICs) toward highly suboptimal field-programmable gate arrays (FPGAs). In this article, we propose a new approach for designing and fabricating high-mix, low-volume heterogeneously integrated ASICs, referred to as Microscale Modular Assembled ASIC (M2A2), consisting of: 1) pick-and-place assembly of prefabricated blocks (PFBs) which utilizes the nano-precision placement capabilities developed in jet-and-flash imprint lithography (J-FIL) and 2) EDA design methodology utilizing unsupervised learning and graph-matching techniques. The EDA methodology leverages existing CAD tool infrastructure for easy adoption into the current EDA ecosystem. The proposed fabrication technology makes use of pick-and-place assembly technique to allow nano-precise assembly of PFBs. The PFBs can be fabricated in advanced process nodes and then knitted together on a wafer substrate. Custom-designed low-cost back-end metal layers can then be created/placed on top of the PFB knitted layer to realize a variety of high-mix, low-volume ASIC designs. M2A2 would allow more flexibility in front-end design by optimal PFB selection and knitting compared to the earlier proposed approaches such as structured ASICs (sASICs). In this article, the performance of M2A2-based designs are compared with different design technologies, such as baseline ASICs, FPGAs, and sASICs at 16 nm, 40 nm, and 130 nm CMOS process nodes. The post-PNR simulation results achieved over 15 IWLS benchmarks show that the proposed M2A2 designs achieve 27.11x -34.89x reduced power-delay-product (PDP) compared to FPGAs, and incur 1.69x -2.36x larger area compared to the baseline ASICs. The M2A2 designs achieve 15%-68.5% smaller area and 8.5%-52% higher performance compared to the sASIC methodologies. Moreover, the key fabrication steps in the proposed M2A2 technology are presented. The experimental fab results along with the proposed EDA flow simulations show promising results for the proposed M2A2 technology. Design tradeoffs and process challenges for large scale deployment of the M2A2 technology are discussed along with their mitigation strategies.