In this paper, a design approach for architecture- aware nonbinary low-density parity-check convolutional codes (NB-LDPC-CCs) is presented to jointly optimizes the code performance and decoder complexity for achieving high energy-efficiency decoder. The proposed NB-LDPC-CCs not only feature simple structure and low degree, but also compete with other published NB-LDPC-CCs on error-correction capability. With these codes, we present a memory-based layered decoder architecture, where the computation units and the scheduling of the computations are optimized to increase energy efficiency. To demonstrate the feasibility of proposed techniques, a time-varying (50,2,4) NB-LDPC-CC over GF(256) is constructed, and associated decoder is implemented in 90 nm CMOS. The code can reach ${\rm BER}=10^{-5}$ at ${\rm SNR}=0.9 {\rm dB}$ , and support multi code rates with puncturing. Comparing with the state-of-the-art designs, the proposed decoder can save 74% power under the same number of iterations, making it suitable for emerging Internet of Things (IoT) applications.