Convolutional neural network (CNN) models for remote sensing (RS) scene classification are largely built on pretrained networks that are trained on the general-purpose ImageNet dataset in computer vision. The pretrained networks can easily be adapted for transfer learning in RS scene classification. However, the accuracy of transfer learning may decline as RS images are considerably different from other images. Thus, the pretrained CNN model learned on ImageNet may not be sufficient for the accurate classification of RS image scenes. Furthermore, most of the pretrained models have large memory footprints, which place a further burden on computational requirements. In this work, we explore SLGE-based random search with early stopping in the search for CNN architectures for both single-label and multilabel RS scene classification tasks. In SLGE, the architecture search space is capable of representing multipath Inception-like modular cells with skip-connections similar to human-expert designs. The experimental results on four RS scene classification benchmarks show that the automatically discovered networks demonstrate the promising capability in classifying multispectral satellite image scenes compared with fine-tuned pretrained CNN models. Using fewer parameters with 0.56B FLOPS, our best network achieves a classification accuracy rate of 96.56% and 96.10% on NWPU-RESISC45 single-label and AID single-label RGB aerial image datasets, respectively, and the classification accuracy rate of 99.76% and 93.89% on EuroSAT single-label and BigEarthNet multilabel multispectral satellite image datasets, respectively. The results position our approach among the best of the state of the art.