The automated classification of gastrointestinal endoscopy images holds immense importance in modern health care. It streamlines the diagnostic process by enabling faster and more accurate identification of gastrointestinal diseases. While the existing automated methods have demonstrated promising performance, there still remains a gap in consistently achieving high accuracy. This is due to reason that endoscopy images suffer from inter-class similarities and intra-class differences, which complicates the classification task. To address these problems, we propose a framework for endoscopy image classification. In general, the proposed framework comprises three essential modules. The first module is the Local-Global Convolutional Neural Network (LG-CNN) which aims to extract both local fine-grained features and captures global context, second module is the Endoscopy-Lesion Attention Module (ELA) that enables the framework to emphasize more crucial regions and filter out noises and other irreverent information. Finally, the last module, Gastrointestinal Endoscopy CNN (GE-CNN) leverages the above two modules in a effective way to classify the input image into various categories. We evaluate the performance of proposed framework on two publicly available challenging datasets, namely, Kvasir, and HyperKvasir. Based on the experimental results, we illustrate the efficacy of the proposed framework in effectively classifying endoscopy images.