The detection of infrared small targets under low signal-to-clutter ratio (SCR) and complex background conditions has been a challenging and popular research topic. In this article, a spatial-temporal feature-based detection framework is proposed. First, several factors, such as the infrared target’s small sample, the sensitive size, and the usual sample selection strategy, that affect the detection of small targets are analyzed. In addition, the small intersection over union (IOU) strategy, which helps to solve the false convergence and sample misjudgment problem, is proposed. Second, aiming at the difficulties due to the target’s dim information and complex background, the interframe energy accumulation (IFEA) enhancement mechanism-based end-to-end spatial-temporal feature extraction and target detection framework is proposed. This framework helps to enhance the target’s energy, suppress the strong spatially nonstationary clutter, and detect dim small targets. Experimental results show that using the small IOU strategy and IFEA mechanism, the proposed multiple frame-based detection framework performs better than some popular deep learning (DL)-based detection algorithms.