Visual feature learning, which aims to construct an effective feature representation for visual data, has a wide range of applications in computer vision. It is often posed as a problem of nonnegative matrix factorization (NMF), which constructs a linear representation for the data. Although NMF is typically parallelized for efficiency, traditional parallelization methods suffer from either an expensive computation or a high runtime memory usage. To alleviate this problem, we propose a parallel NMF method called alternating least square block decomposition (ALSD), which efficiently solves a set of conditionally independent optimization subproblems based on a highly parallelized fine-grained grid-based blockwise matrix decomposition. By assigning each block optimization subproblem to an individual computing node, ALSD can be effectively implemented in a MapReduce-based Hadoop framework. In order to cope with dynamically varying visual data, we further present an incremental version of ALSD, which is able to incrementally update the NMF solution with a low computational cost. Experimental results demonstrate the efficiency and scalability of the proposed methods as well as their applications to image clustering and image retrieval.