Existing deep learning methods have shown outstanding performance in predicting drug–target interactions. However, they still have limitations: (1) the over-reliance on locally extracted features by some single encoders, with insufficient consideration of global features, and (2) the inadequate modeling and learning of local crucial interaction sites in drug–target interaction pairs. In this study, we propose a novel drug–target interaction prediction model called the Neural Fingerprint and Self-Attention Mechanism (NFSA-DTI), which effectively integrates the local information of drug molecules and target sequences with their respective global features. The neural fingerprint method is used in this model to extract global features of drug molecules, while the self-attention mechanism is utilized to enhance CNN’s capability in capturing the long-distance dependencies between the subsequences in the target amino acid sequence. In the feature fusion module, we improve the bilinear attention network by incorporating attention pooling, which enhances the model’s ability to learn local crucial interaction sites in the drug–target pair. The experimental results on three benchmark datasets demonstrated that NFSA-DTI outperformed all baseline models in predictive performance. Furthermore, case studies illustrated that our model could provide valuable insights for drug discovery. Moreover, our model offers molecular-level interpretations.