Many researchers have developed deep learning models for predicting clinical dose distributions and Pareto optimal dose distributions. Models for predicting Pareto optimal dose distributions have generated optimal plans in real time using anatomical structures and static beam orientations. However, Pareto optimal dose prediction for intensity-modulated radiation therapy (IMRT) prostate planning with variable beam numbers and orientations has not yet been investigated. We propose to develop a deep learning model that can predict Pareto optimal dose distributions by using any given set of beam angles, along with patient anatomy, as input to train the deep neural networks. We implement and compare two deep learning networks that predict with two different beam configuration modalities. We generated Pareto optimal plans for 70 patients with prostate cancer. We used fluence map optimization to generate 500 IMRT plans that sampled the Pareto surface for each patient, for a total of 35000 plans. We studied and compared two different models, Models I and II. Although they both used the same anatomical structures - including the planning target volume (PTV), organs at risk (OARs), and body - these models were designed with two different methods for representing beam angles. Model I directly uses beam angles as a second input to the network as a binary vector. Model II converts the beam angles into beam doses that are conformal to the PTV. We divided the 70 patients into 54 training, 6 validation, and 10 testing patients, thus yielding 27000 training, 3000 validation, and 5000 testing plans. Mean square loss (MSE) was taken as the loss function. We used the Adam optimizer with a default learning rate of 0.01 to optimize the network's performance. We evaluated the models' performance by comparing their predicted dose distributions with the ground truth (Pareto optimal) dose distribution, in terms of dose volume histogram (DVH) plots and evaluation metrics such as PTV D98 , D95 , D50 , D2 , Dmax , Dmean , Paddick Conformation Number, R50, and Homogeneity index. Our deep learning models predicted voxel-level dose distributions that precisely matched the ground truth dose distributions. The DVHs generated also precisely matched the ground truth. Evaluation metrics such as PTV statistics, dose conformity, dose spillage (R50), and homogeneity index also confirmed the accuracy of PTV curves on the DVH. Quantitatively, Model I's prediction error of 0.043 (confirmation), 0.043 (homogeneity), 0.327 (R50), 2.80% (D95), 3.90% (D98), 0.6% (D50), and 1.10% (D2) was lower than that of Model II, which obtained 0.076 (confirmation), 0.058 (homogeneity), 0.626 (R50), 7.10% (D95), 6.50% (D98), 8.40% (D50), and 6.30% (D2). Model I also outperformed Model II in terms of the mean dose error and the max dose error on the PTV, bladder, rectum, left femoral head, and right femoral head. Treatment planners who use our models will be able to use deep learning to control the trade-offs between the PTV and OAR weights, as well as the beam number and configurations in real time. Our dose prediction methods provide a stepping stone to building automatic IMRT treatment planning.