In the field of anomaly detection, well-known techniques and state-of-the-art models often face challenges when interpreting the latent space, which hinders their behavioral classification accuracy. Firstly, the sub-optimal distribution of data points within the latent space makes normal behavioral regions verbose and indistinguishable from abnormal regions. Secondly, within the latent space, it can be difficult to identify meaningful, separable, and indicative features. Finally, the processing time at the inference stage is still relatively slow.This paper aims to improve the accuracy of network anomaly detection mechanisms by proposing two novel deep hierarchical representation learning models: Deep Nested Clustering Auto-Encoder (DNCAE) and Deep Clustering Hierarchical Auto-Encoder (DCHAE). Both models adopt a nested branch structure, utilizing dual deep auto-encoders to establish hierarchical latent spaces; in each, clustering algorithms are used to spatially optimize and refine the data points. This approach results in improved separation between normal and abnormal data points, and easier identification of notable and/or indicative features.To ascertain the effectiveness of the approach and the quality of resulting features, both models were used in conjunction with ten different one-class anomaly detectors. Each of these ten anomaly detectors was evaluated on popular network intrusion datasets, notably: NSL-KDD, UNSW-NB15, CIC-IDS-2017, CSE-CIC-IDS-2018, and CTU13. Experimental results have confirmed that both of the proposed models produced higher levels of accuracy than existing baselines and current state-of-the-art models. Additionally, the processing time at the inference stage shows a significant reduction.