The region of interest will change according to the task, even in the same situation. In the study, a method for region of interest detection within a computer screen using a monocular camera is proposed. In contrast to gaze tracking techniques that require particular devices (e.g., an eye tracker and RGB-D device) meanwhile complex calibration, a cheap and more convenient monocular camera is used in this study to solves the eye gaze tracking problem. Firstly, Human face is detected in a real-time video sequence using HoG features. Then, the landmarks around the eyes, which reflect the gaze position, are extracted. Next, the iris centers are detected in the eye region. In order to reduce the gaze error caused by head movement, a three-dimensional head model is proposed to estimate head pose. Finally, the eye region is tracked by calculating the eye vectors and head movement. Experiments were performed to evaluate the face detection, landmarks, iris detection, eye movement estimation, and head pose estimation on databases such as the Hong Kong, BioID, and Boston University head pose databases. Besides, experiments for gaze tracking were performed for a real-time video sequence. Deviation is calculated using Euclidean distance between the real and estimated points. The results show that the method achieves an average error of 1.85∘ with head fixed and 3.58∘ with head movement in the range of −45∘ and 45∘. The requirement is detecting the user’s attention in the screen area. Our method can reach the same level to the other methods, even though the accuracy is not state-of-the-art. Meanwhile, as we all know not only a specific point is concerned but also a region area according to the characteristics of human eye imaging, thus the proposed method can meet the requirements of demand.