At present video cameras are widely distributed in the world and are used for a wide range of real-time tasks - from using optical manipulators to identifying a person. However, at the moment there are no universal cameras that could be effectively used in the entire spectrum of tasks. This is due to the large number of video stream parameters (height and width, brightness, frame contrast) and requirements for both hardware (speed, resistance to external influences) and software (video stream coding, supporting coding) for providing such cameras. This paper analyzes the modular approach to creating an intelligent video camera consisting of several modules. According to the results of the analysis of existing intelligent cameras, their advantages and disadvantages, it was proposed to consider the implementation of existing cameras for three cases. It is proposed to analyze the behavior of an intelligent video camera as a composite device, which consists of the following parts — “Sensor” (a device that transmits undistorted video information to the data handler for further processing), “Data processor” (any device that performs video processing for its further analysis and interpretation) and “Result Translator” (a device that transmits the result for further processing), which allows you to quickly create specialized cameras based on the essence. Requirements were formulated for the hardware component of the options for the practical implementation of the video camera and calculated the maximum speed of information transfer from the Sensor to the “Data processor”. Also, software architecture is proposed for the functioning of the camera. UML-class diagrams for critical program modules were created that describe data reception from the “Sensor” to the “Data processor”, data processing in “Data processor”, and transfer the result for further processing from the “Data processor” to the “Result Translator”.