Due to the success of artificial intelligence systems in solving complex problems, these systems have been investigated in many applications. There is an increasing demand for these systems in many new applications. The XNOR-Net, a binary neural network, is a promising candidate for implementing these systems in devices with limited power and hardware resources. Efficient hardware implementation of XNOR-Nets is critical and has been widely investigated in recent research. In this paper, a resource-sharing gate and architecture and a task-scheduling gate and architecture have been proposed. To reduce power consumption, the proposed nonvolatile logic-in-memory gates are designed based on the magnetic tunnel junction (MTJ) and carbon nanotube field-effect transistor (CNTFET) devices. Using the proposed resource-sharing gate for hardware implementation of XNOR-Net minimizes the number of flip-flops, power consumption, PDP, and area by at least 75%, 75%, 60%, and 42%, respectively, as compared to its state-of-the-art counterparts. Also, using the proposed task-scheduling design to implement the XNOR-Net on 2, 4, and 8 datasets, the power consumption is reduced by at least 24%, 54%, and 65%, respectively, compared to its state-of-the-art counterparts.