Robotic mapping and odometry are the primary competencies of a navigation system for an autonomous mobile robot. However, the state estimation of the robot typically mixes with a drift over time, and its accuracy is degraded critically when using only proprioceptive sensors in indoor environments. Besides, the accuracy of an ego-motion estimated state is severely diminished in dynamic environments because of the influences of both the dynamic objects and light reflection. To this end, the multi-sensor fusion technique is employed to bound the navigation error by adopting the complementary nature of the Inertial Measurement Unit (IMU) and the bearing information of the camera. In this paper, we propose a robust tightly-coupled Visual-Inertial Navigation System (VINS) based on multi-stage outlier removal using the Multi-State Constraint Kalman Filter (MSCKF) framework. First, an efficient and lightweight VINS algorithm is developed for the robust state estimation of a mobile robot by practicing a stereo camera and an IMU towards dynamic indoor environments. Furthermore, we propose strategies to deal with the impacts of dynamic objects by using multi-stage outlier removal based on the feedback information of estimated states. The proposed VINS is implemented and validated through public datasets. In addition, we develop a sensor system and evaluate the VINS algorithm in the dynamic indoor environment with different scenarios. The experimental results show better performance in terms of robustness and accuracy with low computation complexity as compared to state-of-the-art approaches.