Traffic signal controller (TSC) has a crucial role in managing traffic flow in urban areas. Recently, reinforcement learning models have received a great attention for TSC with promising results. However, these RL-TSC models still need to be improved for real-world deployment due to limited exploration of different performance metrics such as fair traffic scheduling or air quality impact. In this work, we introduce a constrained multi-objective RL model that minimizes multiple constrained objectives while achieving a higher expected reward. Furthermore, our proposed RL strategy integrates the peak and average constraint models to the RL problem formulation with maximum entropy off-policy models. We applied this strategy to a single TSC and a network of TSCs. As part of this constrained RL-TSC formulation, we discuss fairness and air quality parameters as constraints for the close-loop control system optimization model at TSCs called FAirLight . Our experimental analysis shows that the proposed FAirLight achieves a good traffic flow performance in terms of average waiting time while being fair and environmentally friendly. Our method outperforms the baseline models and allows a more comprehensive view of RL-TSC regarding its applicability to the real world.