Abstract

This research proposes an automatic region of interest (ROI) prediction architecture with a deep neural network for estimating the learners’ ROI from instructor’s behaviors in lecture archives to generate ROI zoomed videos to fit smaller screens like smart devices. To achieve this goal, we first created a dataset of ROIs from learners’ gaze data in watching the archives and generated 16,039 ROI labels after clustering and smoothing with K-means algorithm based on the gaze point data obtained for the one-second segmented videos. Next, we extracted the instructor’s behaviors as feature maps from the segment video, considering the Frame Difference, Optical Flow, OpenPose, and temporal information. We then composed an Encoder-Decoder architecture that combined U-Net and Resnet with these behavioral features as input to build a deep neural network model for predicting ROI. Through the experiment, the agreement between the ROI labels and the predicted regions was evaluated by Dice loss using each feature map and improved from 0.9 in a single image as a baseline to 0.4 in Openpose and temporal features. The positive potential was obtained from automatic content generation for smart devices through the ROI prediction with the instructor’s behaviors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call