Rising global wind power capacity urgently requires effective real-time turbine monitoring. Complex installations, high maintenance costs, and the occasional need for turbine shutdown hinder conventional monitoring methods. Acoustic-based techniques, while cost-efficient for health assessments, currently face challenges with detection precision and signal processing capabilities in complex acoustic environments. To overcome these challenges, this study proposes a low-cost, AI-driven acoustic-based health monitoring system framework for wind turbine blades, featuring enhanced detection accuracy and signal processing capabilities in complex sound environments. Firstly, a microphone array is employed to capture blade sounds with rich spatial attributes, while beamforming algorithms with a fixed orientation integrate prior spatial information. Furthermore, to further strengthen signal processing and fault detection capabilities, the Transformer model based on self-attention mechanisms is employed for feature extraction and fault diagnosis. An actual wind turbine blade health monitoring system is designed and developed to validate the usability of the proposed framework. Experimental results based on a scaled-down wind turbine model validate the proposed algorithm's effectiveness and practicality, presenting an effective approach in the wind turbine blade health monitoring domain.