Point clouds have been employed extensively in machine perception applications. Farthest point sampling (FPS) is a critical kernel for point cloud processing. With the rapid growth of point cloud scale, FPS introduces a large number of memory accesses, which become the bottleneck of the large-scale point cloud processing. In this paper, we present QuickFPS, an architecture and algorithm co-design of farthest point sampling in large-scale point clouds. First, We systemically analyze the characteristics of FPS and put forward a bucket-based farthest point sampling algorithm. The algorithm introduces a two-level tree data structure to organize the large-scale point cloud into multiple buckets. By using two mechanisms named merged computation and implicit computation for the buckets, the external memory accesses and compute cost are significantly reduced. Then, we design an efficient domain-specific accelerator for farthest point sampling in large-scale point clouds. The accelerator takes advantage of different forms of parallelism and further improves the accelerator’s efficiency. Finally, we evaluate QuickFPS with several widely used point cloud datasets, which include small-scale and large-scale point clouds (up to 120,000 points). Overall, QuickFPS achieves performance speedups of 43.4x and 12.2x compared to GTX 1080Ti GPU and state-of-the-art point cloud accelerator PointAcc, respectively.