In this paper, a transmit beamforming design is considered to jointly improve the communication and sensing functionalities in an integrated sensing and communication (ISAC) system, where a dual-functional base station (BS) serves a set of communication users (UEs) while sensing some potential targets (TGs). To this end, a multi-objective beamforming optimization problem is formulated to maximize the weighted linear combination of the communication energy efficiency (EE) and sensing beampattern gains. Using the normalized weighted sum method, the multi-objective problem is relaxed to a single-objective problem. For dealing with the non-convexity nature of the considered problem and the continuity of the beamforming space, a novel deep contextual bandits (DCB) scheme inspired by the soft actor-critic (S-DCB) method is proposed in which the channel state information of UEs is used as the context. Both the reward and policy functions are approximated to solve the problem. Simulation results indicate that the multi-objective function enables an adjustable trade-off between the communication and sensing functionalities. Besides, the effectiveness of the proposed S-DCB algorithm is verified through a comprehensive comparison with the state-of-the-art DCB-DDPG and the upper bound benchmarks.
Read full abstract