With the increase in misinformation across digital platforms, incongruent news detection is becoming an important research problem. Earlier, researchers have exploited various feature engineering approaches and deep learning models with embedding to capture incongruity between news headlines and their respective bodies. Studies have broadly considered different combinations of bag-of-words-based features, sequential encoding, hierarchical encoding, headline-guided attention-based encoding, and so on, of the text in headlines and bodies. In this article, we focus on addressing two important limitations observed with hierarchical encoding and headline-guided attention-based encoding methods. The existing hierarchical encoding-based studies limit the hierarchical structure of the body of a news article to paragraph level only, undermining the importance of incorporating long-term dependence from word level to sentence, paragraph, and body. Furthermore, the existing headline-guided attention-based encoding focuses on contextually similar contents in the body of the headline, undermining the importance of incorporating contextually dissimilar contents. Motivated by the above observations, this article proposes a gated recursive and sequential deep hierarchical encoding (GraSHE) method for detecting incongruent news articles by extending the hierarchical structure of the news body from the body to the word level and incorporating incongruity weight. From various experimental setups over three publicly available benchmarks datasets, the experimental results indicate that the proposed model outperforms baseline models with bag-of-word-based features, sequential, hierarchical, and headline-guided attention-based encoding methods. To further validate the performance of the proposed model, we conduct several ablation studies. The following key observations can be made from the ablation study: 1) models with hierarchical encoding outperform models with nonhierarchical encoding; 2) recursive encoding of sentences boosts the performance of models as compared with sequential encoding of sentences within paragraphs; and 3) incongruent news article detection is domain-dependent. Incorporating explicit features further boosts the performance of proposed model and also decreases the domain dependence of models.