The selection of an optimal feature subset from all available features in the data is a vital task of data pre-processing used for several purposes such as the dimensionality reduction, the computational complexity reduction required for data processing (e.g., clustering, classification and regression) and the performance enhancement of a data processing technique. To serve such purposes, feature selection approaches which are fundamentally categorized into filters and wrappers try to eliminate irrelevant, redundant and erroneous features in the data. Each category comes with its own advantages and disadvantages. While wrappers can generally provide higher classification performance than filters, filters are computationally more efficient than wrappers. In order to bring the advantages of wrappers and filters together, i.e., to get higher classification performance with smaller feature subset size in a shorter time, this paper proposes a differential evolution approach combining filter and wrapper approaches through an improved information theoretic local search mechanism which is based on the concepts of fuzziness to cope with both continuous and discrete datasets. To show the superiority of the proposed approach, it is examined and compared with traditional and recent evolutionary feature selection approaches on several benchmarks from different well-known data repositories.