Environmental sound (ES) consists of the surrounding sounds around us. The classification of this kind of sound has specific and significant contributions in many modern world applications, especially in the internet of things (IoT). However, accurately predicting the sound class of an ES is difficult because of the uncertain pattern in most cases. Although a lot of research has been done to develop a fully accurate environmental sound classification (ESC) system using different extracted features from signal, feature selection and classifier design is still a harsh job and sometimes does not guarantee a precise result. In recent years, ESC has attracted the attention of the research community because of the breakthrough in learning algorithms. Following this trend, a study is carried out here to find and propose two CNN models, 1D and 2D, for raw signal input and Gammatone spectrogram input, respectively. Compared with 2D CNN model, 1D CNN for time series waveform is a recent idea. Assessing the models on two various datasets (ESC-10 and US-8 K), the overall accuracy for 2D CNN is found 80.2% (ESC-10), 89% (US-8 K). For 1D CNN, the accuracy is 80.4% (ESC-10), 86% (US-8 K). The performance of the proposed 1D CNN model is almost similar to that of the proposed 2D CNN model. Effective data augmentation procedure is done to achieve this high accuracy by introducing variability in the given dataset, and to reduce overfitting. Finally, the proposed models have a minimal number of parameters to be trained, and floating-point operations per second (FLOPS) are also minimal, especially for 1D CNN.