Aquaculture net pen defects such as biofouling, vegetation, and holes are key challenges to efficient and sustainable fish production in aquaculture. These defects must be monitored to avoid problems with aquaculture net safety as well as fish growth. Recently, deep learning methods have been adopted to solve detection and classification problems in different applications. However, the conventional methods are challenging to meet the demands of high precision and real-time detection of aquaculture net defects in a complex marine environment. Towards this end, this paper proposes an autonomous net pen defect detection system that contains a novel multi-scale semantic segmentation topology for detecting the biofouling, vegetation, and hole problems in the aquaculture environment. In particular, we emphasize fusing the attention maps obtained across different decomposition levels of the network to generate rich feature distributions that enable the accurate extraction of biofouling, vegetation, and holes within aquaculture nets. Moreover, the proposed model is thoroughly tested on two private datasets and two public datasets which were acquired from a real-field fish farms. Across all four datasets, the proposed framework showed remarkable biofouling, vegetation, and hole detection performance, where it outperformed state-of-the-art methods by 6.58%, 3.69%, 6.44%, and 4.78% in terms of mean average precision across LABUST, KU, NDv1, and NDv2 datasets, respectively.