Automatic road extraction from historical maps is an important task to understand past transportation conditions and conduct spatiotemporal analysis revealing information about historical events and human activities over the years. This research aimed to propose the ideal architecture, encoder, and hyperparameter settings for the historical road extraction task. We used a dataset including 7076 patches with the size of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$256 \times256$ </tex-math></inline-formula> pixels generated from scanned historical Deutsche Heereskarte 1:200 000 Türkei (DHK 200 Turkey) maps and their corresponding digitized ground truth masks for five different roads types. We first tested the widely used Unet++ and Deeplabv3 architectures. We also evaluated the contribution of attention models by implementing Unet++ with the concurrent spatial and channel-squeeze and excitation block and multiscale attention net. We achieved the best results with split-attention network (Timm-resnest200e) encoder and Unet++ architecture, with 98.99% overall accuracy, 41.99% intersection of union, 51.41% precision, 69.7% recall, and 57.72% F1 score values. Our output weights could be directly used for the inference of other DHK maps and transfer learning for similar or different historical maps. The proposed architecture could also be implemented in different road extraction studies.