Feature selection acts as an important preprocessing method to reduce redundant information. In order to effectively evaluate the classification information hidden in a given attribute subset, a novel rough set model is put forth via integrating covering based rough fuzzy sets with multi-granulation rough sets. In view of this, fuzzy β-neighborhood is employed to describe the information representation and knowledge fusion of covering families, by which a pair of approximation operators are formulated and a new multi-granulation rough fuzzy set model is introduced. The generalized model gives a unified perspective for existing rough set models. We then investigate the axiomatic characterizations by view of optimism and pessimism. Finally, the data reduction is processed from the point of keeping the discrimination ability. The multi-granulation significance function of a candidate attribute in term of fuzzy decisions is defined, using which a greedy algorithm is developed for multi-granulation feature selection. Experiments on twelve different types of datasets show that our model is efficient and superior to three popular algorithms in terms of reduction rate and classification performance.