In this paper a method for remote detection of forest fires in video signals from surveillance cameras is presented. The idea is based on learned redundant dictionaries for sparse representation of feature vectors extracted from image patches on three different regions; smoke, sky and ground. A testing image patch is assigned to the region for which the corresponding dictionary gives the best sparse representation during segmentation. To further reduce the presence of misclassified patches, a spatio-temporal cuboid of patches is built around a classified patch to take a majority vote in the set of classes inside the cuboid. To reduce the number of false positives there is a verification process to determine if a region of interest is growing. Theory, results, issues and challenges related to the implementation of the forest fire monitoring system, and performance of the method are presented.