Millimeter wave (mmWave) enabled unmanned aerial vehicle (UAV) communications with excellent flexibility and high data transmission capabilities are widely regarded as an essential element of the next generation non-terrestrial networks. In this paper, we investigate the 3D trajectory design, beamwidth and power allocation problems for mmWave UAV communication systems. We first formulate a non-convex optimization problem to maximize the total normalized spectral efficiency (NSE) of all ground terminals (GTs). Then, we propose a deep reinforcement learning (DRL) based framework, termed as intelligent flying-beamformer, to solve the formulated non-convex optimization, which contains of two sequential phases: 3D UAV trajectory design, and joint optimization of beamwidth and power allocation. In particular, the off-policy deep deterministic policy gradient (DDPG) and the on-policy proximal policy optimization (PPO) algorithms are used to process the complex data in the two phases, respectively. Simulation results verify the effectiveness of the proposed intelligent flying-beamformer in improving system capacity. Specifically, numerical results show that the optimization of the beamwidth can indeed significantly improve the spectral efficiency of mmWave UAV network.