Deep learning networks have yielded promising insights in the field of image classification. However, the hierarchical image classification (HIC) task, which involves assigning multiple, hierarchically organized labels to each image, presents a notable challenge. In response to this complexity, we developed a novel framework (HCAM-CL), which integrates a hierarchical cross-attention mechanism with a CNN-LSTM architecture for the HIC task. The HCAM-CL model effectively identifies the relevance between images and their corresponding labels while also being attuned to learning the hierarchical inter-dependencies among labels. Our versatile model is designed to manage both fixed-length and variable-length classification pathways within the hierarchy. In the HCAM-CL model, the CNN module is responsible for the essential task of extracting image features. The hierarchical cross-attention mechanism vertically aligns these features with hierarchical levels, uniformly weighing the importance of different spatial regions. Ultimately, the LSTM module is strategically utilized to generate predictive outcomes by treating HIC as a sequence generation challenge. Extensive experimental evaluations on CIFAR-10, CIFAR-100, and design patent image datasets demonstrate that our HCAM-CL framework consistently outperforms other state-of-the-art methods in hierarchical image classification.