Accurate urban areas information is important for a variety of applications, especially city planning and natural disaster prediction and management. In recent years, extraction of urban structures from remotely sensed images has been extensively explored. The key advantages of this imaging modality are reduction of surveying expense and time. It also elevates restrictions on ground surveys. Thus far, much research typically extracts these structures from very high resolution satellite imagery, which are unfortunately of relatively poor spectral resolution, resulting in good precision yet moderate accuracy. Therefore, this paper investigates extraction of buildings from middle and high resolution satellite images by using spectral indices (Normalized Difference Building Index: NDBI, Normalized Difference Vegetation Index: NDVI, Soil Adjustment Vegetation Index: SAVI, Modified Normalized Difference Index: MNDWI, and Global Environment Monitoring Index: GEMI) by means of various Machine Learning methods (Artificial Neural Network: ANN, K-Nearest Neighbor: KNN, and Support Vector Machine: SVM) and Data Fusion (i.e., Majority Voting). Herein empirical results suggested that suitable learning methods for urban areas extraction are in preferring order Data Fusion, SVM, KNN, and ANN. Their accuracies were 85.46, 84.86, 84.66, and 84.91%, respectively.