Fisheries regulations require detailed catch reporting on commercial fishing vessels. Vital components for the sustainable management of fish stocks include a robust estimate of the number of fish caught and the species composition. Catch recording is often done manually by human observers on fishing vessels. Human observers are costly, and consistent data streams can be subject to observer availability and the weather. On-vessel cameras (electronic monitoring, EM) are a growing alternative to human observers. However, on-land human auditors are required to review hundreds of hours of videos recorded during fishing trips that can last for weeks. In this paper, a framework is presented to automatically detect fish in EM videos, count the total fishing events, and classify the fish species. For this purpose, a deep learning and computer vision-based model is developed to efficiently detect fish and fishers onboard a vessel. Secondly, a vision-based tracking pipeline tracks the detected fish and counts the total fishing events in the videos. Thirdly, the extracted fishing events are classified through a deep learning-based fish species classifier, to provide the distribution of different fish species caught for a fishing trip. For our experiments, the datasets were prepared using the electronic monitoring data of multiple fishing trips of a fishing vessel. The videos were recorded on Australian longline vessels targeting tunas and billfish. For the fish detection task, video frames were extracted and labelled manually to provide a digital ground-truth. For the fish species classification task, hundreds of fish images of multiple species were cropped to provide a training dataset for the fish classifier. For the fish counting task, manual counts for the fishing events of individual fish species were generated for the test fishing trips. The developed fish and fisher detector achieves a mean Average Precision of 87.0 % for fish and 94.0 % for fishers on test video frames. The fishing event detection pipeline achieves an Average Precision of 81.0 % and an Average Recall of 74.5 % on test videos. The fish species classifier achieves an Accuracy (Top-1) of 91.11 % for the classification of cropped fish images and 89.05 % for the classification of extracted fishing events from the videos. Experimental results show that our proposed computer vision and artificial intelligence-based solution for video analysis has great potential to automate the auditing process from electronic monitoring footage and contribute to the sustainable management of fish stocks.