In this paper we consider the generalization of binary spatially coupled low-density parity-check (SC-LDPC) codes to finite fields GF$(q)$, $q\geq 2$, and develop design rules for $q$-ary SC-LDPC code ensembles based on their iterative belief propagation (BP) decoding thresholds, with particular emphasis on low-latency windowed decoding (WD). We consider transmission over both the binary erasure channel (BEC) and the binary-input additive white Gaussian noise channel (BIAWGNC) and present results for a variety of $(J,K)$-regular SC-LDPC code ensembles constructed over GF$(q)$ using protographs. Thresholds are calculated using protograph versions of $q$-ary density evolution (for the BEC) and $q$-ary extrinsic information transfer analysis (for the BIAWGNC). We show that WD of $q$-ary SC-LDPC codes provides significant threshold gains compared to corresponding (uncoupled) $q$-ary LDPC block code (LDPC-BC) ensembles when the window size $W$ is large enough and that these gains increase as the finite field size $q=2^m$ increases. Moreover, we demonstrate that the new design rules provide WD thresholds that are close to capacity, even when both $m$ and $W$ are relatively small (thereby reducing decoding complexity and latency). The analysis further shows that, compared to standard flooding-schedule decoding, WD of $q$-ary SC-LDPC code ensembles results in significant reductions in both decoding complexity and decoding latency, and that these reductions increase as $m$ increases. For applications with a near-threshold performance requirement and a constraint on decoding latency, we show that using $q$-ary SC-LDPC code ensembles, with moderate $q>2$, instead of their binary counterparts results in reduced decoding complexity.