Registration of large-scale outdoor Terrestrial Laser Scanning (TLS) point clouds remains many challenges in the scenes with symmetric and repetitive elements (e.g., park, forest, and tunnel), the weak geometric features (e.g., underground excavation), and dramatically changes in different phases (e.g., mountain). To address these issues, a novel neural network JoKDNet is proposed to jointly learn the keypoint detection and feature description to improve the feasibility and accuracy of point clouds registration. Firstly, a novel keypoint detection module is introduced to automatically learn the score of each sampled point and regard the most significant Top-k sampled points as the detected keypoints. Secondly, an enhanced feature description module is proposed to learn the feature representation of each keypoint by fusing the hierarchical local features and context features. Thirdly, a loss function is designed to make the detected keypoints more distinguishable for matching, which simultaneously maximizes the feature distance between non-corresponding keypoints and minimizes the feature distance of corresponding keypoints. Finally, the distance matrix module and RANdom SAmple Consensus (RANSAC) are utilized to determine the correspondences of source and target point clouds for the transformation calculation. Comprehensive experiments show that the JoKDNet performs effectively on five challenging scenes (e.g., park, forest, tunnel, underground excavation, and mountain) from two datasets (WHU-TLS and ETH-TLS) in terms of registration errors, and robustness to varying scenes, with the maximum rotation error less than 0.06° and maximum translation error less than 0.84 m without ICP.