Person Search aims to simultaneously address Person Detection and Person Re-ID. There are various challenges in person search such as significant scale variations, occlusions, and partial instances. In this paper, we propose a Multi-Scale Multi-Grained (MSMG) sequential network for end-to-end person search, intended to alleviate these issues. To generate re-id representations robust to scale changes, MSMG leverages multi-scale RoI features and aggregates them with a proposed Multi-Scale feature Aggregation Encoder (MSAE). In this way, the aggregated multi-scale re-id features are enriched with more semantic information and detailed information, thereby being more discriminative for identification. Moreover, to produce re-id representations more robust to occlusions and partial instances, MSMG introduces a Multi-Grained feature Learning Decoder (MGLD) focused on multi-grained feature learning. MGLD adaptively decodes multi-grained re-id representations with more accurate semantic information through a regional deformable cross-attention module. Finally, the multi-scale multi-grained re-id representation substantially improves the identification accuracy under challenging cases. Through comprehensive experiments, we demonstrate that our method achieves state-of-the-art performance on two benchmark datasets. On the challenging PRW benchmark, MSMG obtains the best-reported results with a mean average precision (mAP) score of 61.3%.