Computer vision is a dynamic and rapidly evolving field within the broader domain of artificial intelligence. Within surveillance monitoring systems, one of the central tasks is object detection, which involves identifying and localizing objects of interest in video sequences to provide safety and security of the people. Detection of multiple objects is a challenging task in video sequences which interprets less accuracy and false Bounding box regression. In this paper, enhanced faster R-CNN model is proposed and trained to compute regional proposal through Convolutional layers on the different scene of the sequences in term of lighting, motion capture related to spatial analysis. These enhancements could encompass architectural improvements, novel training strategies, or the incorporation of additional data sources to improve the model's overall performance. Proposed model is experimented on pedestrian video gives an improved accuracy detection rate than single detector techniques.