Shrimp farming plays a vital role in ensuring food security, income generation, and employment opportunities for millions of people globally. One promising approach to enhance the efficiency of shrimp farming is through shrimp image detection, which uses computer vision to identify shrimp in aquaculture ponds. However, detecting shrimp in complex environments can be challenging, as light conditions and water turbidity can interfere with existing detection models. The current shrimp detection methods suffer from issues such as low detection accuracy and limited robustness. Therefore, we presents a Transformer-based Shrimp Detector (TSD) framework for detecting shrimp in images of complex scenes. The algorithm employs a Convolutional Neural Network to extract image features, which are then fed into a transformer encoder–decoder. We also introduce an innovative object query setting method in the decoder that uses a random feature query. To predict the detection results, a feed-forward neural network is used, and two parts in the matching loss are improved. We construct a dataset of images containing diverse shrimp farming environments, and provide box-level annotations. The experimental results demonstrate that the proposed method achieves a detection accuracy of 82.7% Average Precision, surpassing mainstream object detection models.