Staircase cleaning is a crucial and time-consuming task for maintenance of multistory apartments and commercial buildings. There are many commercially available autonomous cleaning robots in the market for building maintenance, but few of them are designed for staircase cleaning. A key challenge for automating staircase cleaning robots involves the design of Environmental Perception Systems (EPS), which assist the robot in determining and navigating staircases. This system also recognizes obstacles and debris for safe navigation and efficient cleaning while climbing the staircase. This work proposes an operational framework leveraging the vision based EPS for the modular re-configurable maintenance robot, called sTetro. The proposed system uses an SSD MobileNet real-time object detection model to recognize staircases, obstacles and debris. Furthermore, the model filters out false detection of staircases by fusion of depth information through the use of a MobileNet and SVM. The system uses a contour detection algorithm to localize the first step of the staircase and depth clustering scheme for obstacle and debris localization. The framework has been deployed on the sTetro robot using the Jetson Nano hardware from NVIDIA and tested with multistory staircases. The experimental results show that the entire framework takes an average of 310 ms to run and achieves an accuracy of for staircase recognition tasks and accuracy for obstacle and debris detection tasks during real operation of the robot.