The contemporary trend in large scale enterprises, like public infrastructures or industrial plants, is to use architectures and platforms for video surveillance through installation of a network of cameras in critical sites and monitoring the captured video data. Video surveillance provides quality assurance of services and/or products (adherence to predefined procedures of a production), traffic management control (for high-dense urban areas), security/safety (prevention from actions that may lead to hazardous situations), crisis management in public areas (e.g., train stations, airports), or a series of other applications of high industrial/social impact. However, the current commercial video surveillance systems support mostly manual supervision, making them both inefficient and subjective. The inefficiency stems from the fact that it is impossible for a human to continually concentrate on monitors that display different activities in different areas. The subjectivity arises from the fact that humans usually interpret the same visual information differently under different conditions. For this reason, methods, tools and algorithms that aim to detect and recognize high level concepts and their respective spatio-temporal and causal relations (to identify semantic video activities, actions and procedures) have been in the focus of the research community over the last years and many research efforts have been paid within the computer vision and machine learning communities. The traditional approaches for event detection in videos assume well structured environments and they fail to operate in largely unsupervised way under adverse and uncertain conditions from those on which they have been trained. Another drawback of current methods, is the fact that they focus on narrow domains using specific concept detectors such as “human faces”, “cars”, “buildings”. This special issue seeks original high innovative research in the area of self configurable cognitive video supervision in several domains. This Call for Papers was very well received, and we collected several high quality papers. A severe review process led to a selection of a number of very good papers, so that the Editor-inChief agreed to devote two Special Issues in order to include all the accepted papers. This special issue consists of eleven (11) papers that cover most of the areas of computer vision research in towards events, actions and workflows analysis. The papers can be Multimed Tools Appl (2010) 50:1–6 DOI 10.1007/s11042-010-0514-2