Key frame extraction is very important in video summarization and content-based video analysis to address the problem of data redundancy in a video. Key frame extraction enables quick navigation and expert video arrangement in many applications. The visually impaired can benefit from the use of key frame extraction for rapid object recognition and tracking. Most key frame extraction techniques consider only a single visual feature instead of multiple features or full pictorial information of the video. This study proposes a key frame extraction method from a video that (i) first removes insignificant frames by pre-processing, (ii) second, four visual and structural feature differences among the consecutive frames are extracted and aggregated to identify informative frames, (iii) third, to cluster the obtained frames, a hybrid FCM-AHA method is proposed by combining Fuzzy C-means(FCM) with artificial hummingbird optimization algorithm (AHA) to circumvent the local minima trapping problem of FCM, and finally, from each cluster, the two frames having greatest Euclidean distance from all the other frames within a cluster is selected as key frames to remove redundant frames. Experimental results on the Open video and YouTube datasets show that the suggested method outperforms state-of-the-art methods both in terms of subjective qualitative analysis and objective quantitative evaluation, e.g., Precision, Recall, and F-score. Further, results are also taken on real video to demonstrate its applicability in real-life applications.