Stacked Sparse Auto-Encoder (SSAE) is well known hierarchical deep neural networks for simulating the deep architecture of mammal brain. SSAE can be trained in a greedy layer-wise manner by using methods based on gradient such as Limited memory BFGS (LBFGS). However, methods based on gradient have many disadvantages. The main disadvantage is that they are sensitive to the initial value. In this paper, a meta-heuristic algorithm based on gradient, referred to GCIWOSS, is used to optimize the weights and biases of SSAE. Chaos strategy is firstly used to initial the population of IWO and then a new selection strategy is adopted with the purpose of improving the diversity of population and increasing the global exploration ability. The improved IWO is preparing for the following exploitation based on gradient to avoid falling into local optimal values. In the experiments, the proposed algorithm is proven to be effective in extracting features from different image datasets, compared with the LBFGS and several other feature learning models.