The proper assessment of target-specific compound selectivity is paramount in the drug discovery context, promoting the identification of drug-target interactions (DTIs) and the discovery of potential leads. On that account, the accurate prediction of an unbiased drug-target binding affinity (DTA) metric is pivotal to understanding the binding process. Most in silico computational approaches, however, neglect the inter-dependency of the proteomics, chemical, and pharmacological spaces and the explainability during the model construction. Furthermore, these methods have yet to actively include information associated with binding pockets during the learning process, which is essential to DTA prediction performance and model explainability. In this study, we propose an end-to-end binding-region-guided Transformer-based architecture that simultaneously predicts the 1D binding pocket and the binding affinity of DTI pairs, where the prediction of the 1D binding pocket guides and conditions the prediction of DTA. This architecture uses 1D raw sequential and structural data to represent the proteins and compounds, respectively, and combines multiple Transformer-Encoder blocks to capture and learn the proteomics, chemical, and pharmacological contexts. The predicted 1D binding pocket conditions the attention mechanism of the Transformer-Encoder used to learn the pharmacological space in order to model the inter-dependency amongst binding-related positions. The results show that the proposed architecture, TAG-DTA, achieved the best performance in DTA prediction compared to state-of-the-art benchmarks, including in unknown subsets of the proteomics and chemical representation spaces. Moreover, the 1D binding pocket prediction increases the discriminative power and robustness of the aggregate representation of the pharmacological space and improves the DTA prediction performance. Overall, this research study validates the applicability of an end-to-end Transformer-based architecture in the context of drug discovery, and that combining computationally different yet contextually related tasks is critical to new findings in the DTI domain. Additionally, it shows that TAG-DTA is capable of providing increasing DTI and prediction understanding due to the nature of the attention blocks and prediction of the 1D binding pocket. The data and source code used in this study are available at: https://github.com/larngroup/TAG-DTA.