In real-time applications, like interactive virtual reality environments, there is a significant need for low-complexity simulation of room impulse responses in highly complex virtual scenes, but this remains a challenging issue. In particular, simulating late reverberation using physically based acoustic modeling requires much computational effort, contrary to the early reflections that can be modeled by simpler techniques, e.g., the image source method. To tackle this computational complexity issue, we propose a neural network-based hybrid artificial reverberation framework (Echo2Reverb) that generates late reverberation from given early reflections. The proposed model can control both temporal texture and frequency-dependent energy decay, i.e., echo density and spectral energy distribution, of the generated reverberations by extracting spectral and echo-related features and filtering sampled sparse sequences and Gaussian noises using estimated features. To support the end-to-end training with controlled echo density, a differentiable approximation of the normalized echo density profile is proposed. We train and test the model not only for nearly diffuse but also distinct echoes prominent in late reverberations, such as with flutter echoes in narrow corridors. Evaluation results demonstrate that the proposed model can accurately reproduce frequency-dependent energy decay and temporal texture of a room impulse response using only early reflections.