This paper proposes a system for real-time estimation of the calorific value of mixed straw fuels based on an improved U-Net semantic segmentation model. This system aims to address the uncertainty in heat and power generation per unit time in combined heat and power generation (CHPG) systems caused by fluctuations in the calorific value of straw fuels. The system integrates an industrial camera, moisture detector, and quality sensors to capture images of the multi-fuel straw. It applies the improved U-Net segmentation network for semantic segmentation of the images, accurately calculating the proportion of each type of straw. The improved U-Net network introduces a self-attention mechanism in the skip connections of the final layer of the encoder, replacing traditional convolutions by depthwise separable convolutions, as well as replacing the traditional convolutional bottleneck layers with Transformer encoder. These changes ensure that the model achieves high segmentation accuracy and strong generalization capability while maintaining good real-time performance. The semantic segmentation results of the straw images are used to calculate the proportions of different types of straw and, combined with moisture content and quality data, the calorific value of the mixed fuel is estimated in real time based on the elemental composition of each straw type. Validation using images captured from an actual thermal power plant shows that, under the same conditions, the proposed model has only a 0.2% decrease in accuracy compared to the traditional U-Net segmentation network, while the number of parameters is significantly reduced by 74%, and inference speed is improved 23%.