Internet of Things (IoT) devices are projected to attain an $1100B market by 2025, with a web of interconnection projected to comprise approximately 75+ billion IoT devices. The large number of IoTs consist of sensory systems that enable massive data collection from the environment and people. However, considerable portions of the captured sensory data are redundant and unstructured. Data conversion of such large raw data, storing in volatile memories, transmission, and computation in on-/off-chip processors, impose high energy consumption, latency, and a memory bottleneck at the edge. Therefore, high-speed, low-power, and normally-off computing domain-specific architectures should be explored and developed to overcome these issues. Motivated by the aforementioned concerns, we will be focusing on cross-layer (device/circuit/architecture/application) co-design of energy-efficient and high-performance processing-in-sensor and processing-in-memory platforms for implementing complex AI and machine learning tasks, bioinformatics tasks, graph processing, etc. We explain how to leverage innovations from circuits and architecture to integrate sensors, memory, and logic to break the existing memory and power walls.