This study examines an automatic broadcast soccer video composition system. The research is important as the ability to automatically compose broadcast sports video will not only improve broadcast video generation efficiency, but also provides the possibility to customize sports video broadcasting. We present a novel approach to the two major issues required in the system's implementation, specifically the camera view selection/switching module and the automatic replay generation module. In our implementation, we use multi-modal framework to perform video content analysis, event and event boundary detection from the raw unedited main/sub-camera captures. This framework explores the possible cues using mid-level representations to bridge the gap between low-level features and high-level semantics. The video content analysis results are utilized for camera view selection/switching in the generated video composition, and the event detection results and mid-level representations are used to generate replays which are automatically inserted into the broadcast soccer video. Our experimental results are promising and found to be comparable to those generated by broadcast professionals.