Video Summarization is a video compression/compaction technique to create a shorter yet informative version of original video. Video summarization has offered solutions to plenty of media, user and engineering applications. Though sports video summarization has been an active research topic for some time; there still exists a void for multi-modal, dynamic, generic and domain knowledge based approach for Cricket Sport video summarization. This paper presents a multi-modal video summarization approach to summarize Cricket sport videos. This work captures the domain knowledge acquired from multi-modal (audio-visual) cues. A dual neural network architecture pipeline is proposed to dynamically segment and dynamically summarize Cricket videos for generic target audience. The former Neural Network is grounded on Cricket bowling activity (visual feature) for dynamic video segmentation of Cricket videos. The segments are then forwarded to the latter Neural Network for identification of key segments. The key segment detection module relies on Audio analysis of Cricket video stream to identify exciting, content representative and informative segments as per Cricket domain. Experimental analysis on two novel proposed benchmark datasets, i.e. DPCS (Delivery Play Cricket Sport) image dataset and EXINP (Excited Interval Normal Play) Cricket Dataset (audio based) shows promising results. The results indicate that the proposed multi-modal approach generates exciting, content representative, informative, generic and dynamic summary incorporating domain knowledge of the sport.