Spike prediction models effectively predict downstream spike trains from upstream neural activity for neural prostheses. Such prostheses could potentially restore damaged neural communication pathways using predicted patterns to guide electrical stimulations on downstream. Since the ground truth of downstream neural activity is unavailable for subjects with the damage, reinforcement learning (RL) with behavior-level rewards becomes necessary for model training. However, existing models do not involve any constraint on the generated firing patterns and neglect the correlations among neural activities. Thus, the model outputs can greatly deviate from the natural range of neural activities, causing concerns for clinical usage. This study proposes the neural manifold constraint to solve this problem, shaping RL-generated spike trains in the feature space. The constraint terms describe the first and second order statistics of the neural manifold estimated from neural recordings during subjects' freely moving period. Then, the models can be optimized within the neural manifold by behavioral reinforcement. We test the method to predict primary motor cortex (M1) spikes from medial prefrontal (mPFC) spikes when rats perform the two-lever discrimination task. Results show that the neural activity generated by constrained models resembles the real M1 recordings. Compared with models without constraints, our approach achieves similar behavioral success rates, but reduces the mean squared error of neural firing by 61%. The constraints also increase the model's robustness across data segments and induce realistic neural correlations. Our method provides a promising tool to restore transregional communication with high behavioral performance and more realistic microscopic patterns.