This paper proposes a novel remote monitoring system using a head mounted display (HMD). By using an HMD, a high degree of immersion and the sense of reality is provided to the user. Further, it becomes more convenient to move the camera position while controlling the robot arm on which the camera is mounted through the coordinates obtained from the HMD sensors. The proposed system performs two tasks. Initially, in order to render images in the HMD, the camera captures input images, and the PC connected to the robot arm sends the image to the PC connected to the HMD in real-time after which the image is rendered in the HMD. For the real-time monitoring, we increased the frame per second by reducing the data size. In order to reduce the data size, we define region of interest which is the region a user can see. Then, region of interest of the image is cropped and the resolution of entire image is degraded. Therefore, two images composed of cropped image and degraded image are transmitted and merged into an image. In this way, we can reduce the data size and provide the user with monitoring the image of the original quality. Next, using the inertial measurement unit sensor and base station sensors built into the HMD, the user controls the displacement of the camera during rotation and translation by controlling the robot arm by the user’s own motion. The PC connected to the HMD transmits the motion coordinates of the HMD acquired from the sensor into the PC connected to the robot arm. To control the translation of the camera, we define a coordinate system that represents the y- axis as running from top to bottom, and the x- axis as running from the front to the back in three-dimensional space. Thus, each step motor along each axis, which controls the robot arm, has its movement controlled by supplying appropriate values of angular degrees. Through the above process, the proposed method implements a remote monitoring system with five degrees of freedom. In the experiment, we measured the data transmission delay time and the displacement error due to the robot arm’s motion control to determine whether the system is suitable for a real-time remote monitoring system. Experimental results show that the proposed system can be optimally operated with a frame rate of 15 frames per second and a group of pixel resolutions of $640\times 640$ of the cropped image and $240\times 120$ of the degraded image. In addition, the average error rate of the robot arm’s displacement was 6.45% when the camera position was controlled through the robot arm.