Long Short-Term Memory Networks (LSTMs) are pivotal in on-device time series analysis for embedded systems, particularly for managing sensor data streams. Yet, their deployment on resource-constrained embedded devices presents notable challenges. In response, we introduce a novel parameterized architecture for LSTM accelerators designed explicitly for embedded Field-Programmable Gate Arrays (FPGAs). Our approach involves strategic design choices, such as employing computationally efficient activation functions and optimizing clock frequency with a pipelined Arithmetic Logic Unit (ALU). These decisions drive our architecture towards enhanced energy efficiency while maintaining adaptability across diverse application scenarios. A key feature of our architecture is its configurable parameters, which allow for tailored optimization through the optional use of Digital Signal Processor Slices for ALUs and the selective implementation of activation functions. Our empirical evaluations conducted on the Spartan-7 XC7S15 FPGA demonstrate the robustness of our methodology, achieving a 2.33× improvement in energy efficiency over previous solutions. Furthermore, our study examines the correlation between memory resource types and energy efficiency across various LSTM model sizes. Impressively, even with a 9× increase in the hidden size of the LSTM cell, our accelerator maintains an energy efficiency of 10.03 GOP/s/W, with only a minor decrease of 14.65%. However, it is critical to note that our current design is not yet optimized for larger FPGA models such as the Spartan-7 XC7S25 and XC7S50. For these models, timing constraints, rather than resource limitations, pose challenges to scaling, highlighting a potential area for future optimization.