Recent advances in technologies for capturing video data have opened a vast amount of new application areas in visual sensor networks. Among them, the incorporation of light wave cameras on Ambient Intelligence (AmI) environments provides more accurate tracking capabilities for activity recognition. Although the performance of tracking algorithms has quickly improved, symbolic models used to represent the resulting knowledge have not yet been adapted to smart environments. This lack of representation does not allow to take advantage of the semantic quality of the information provided by new sensors. This paper advocates for the introduction of a part-based representational level in cognitive-based systems in order to accurately represent the novel sensors' knowledge. The paper also reviews the theoretical and practical issues in part-whole relationships proposing a specific taxonomy for computer vision approaches. General part-based patterns for human body and transitive part-based representation and inference are incorporated to an ontology-based previous framework to enhance scene interpretation in the area of video-based AmI. The advantages and new features of the model are demonstrated in a Social Signal Processing (SSP) application for the elaboration of live market researches.