A contemporary approach for acquiring the computational gains of depth in recurrent neural networks (RNNs) is to hierarchically stack multiple recurrent layers. However, such performance gains come with the cost of challenging optimization of hierarchal RNNs (HRNNs) which are deep both hierarchically and temporally. Though the researchers have exclusively highlighted the significance of using highways (or direct shortcuts) for learning deep hierarchical representations and deep temporal dependencies. However, no significant efforts are made to unify these finding into a single framework for learning deep HRNNs. We propose hierarchical recurrent highway network (HRHN) that contains highway within the hierarchical and temporal structure of the network for unimpeded information propagation across both dimensions, thus alleviating gradient vanishing problem. The proposed HRHN contain significantly reduced data-dependent parameters as compared to related methods. The experiments on language modeling (LM) tasks have demonstrated that the proposed architecture leads to design effective models. On character-level LM using Hutter prize dataset, the model achieved entropy of 2.4 bits per character. The attained perplexity on world-level LM using Penn TreeBank is 68.1. The HRHN outperformed other baseline and related models that we tested.