This paper addresses the detection of finfish and krill in echograms. Finfish, in particular Pacific hake, are used both as human food and fish meal. Krill, harvested for aquaculture and aquariums, are a primary food source for finfish, including hake. Thus, spatial distributions of hake follow that of krill. Stock assessments need an accurate differentiation of krill from finfish (hake) in acoustic echograms. This paper proposes a semantic segmentation paradigm for the pixel-level classification of multi-frequency information to detect co-occurring finfish and krill. This paradigm is highly relevant for identifying cloud-like, diffuse krill aggregations that are intertwined with small, often sparse and sometimes dense schools of finfish. We propose U-MSAA-Net, a deep learning U-Net-like framework with novel multi-scale additive attention (MSAA) modules. MSAA modules allow us to leverage all contextual and local information from feature maps available at any given level of the decoding phase of the network, yielding an efficient suppression of the feature responses from regions with lesser semantic value. Experimental results on a new finfish and krill data set spanning across nine months of acoustic data and covering various situations show that U-MSAA-Net outperforms both traditional, texture-based machine learning methods, and deep learning methods based on state-of-the-art semantic segmentation networks. Additional experiments on a data set containing schools of herring and salmon confirm the versatility of U-MSAA-Net and its superiority in terms of accuracy and ability to detect schools of varying sizes. U-MSAA-Net is the first step in creating a comprehensive tool for stock and ecosystem assessments.