Fake news is a real problem; unfortunately, it seems to worsen. Even though some false news detection methods have made significant progress, current multimodal approaches integrate cross-modal features directly without considering uncorrelated semantic representations may introduce noise into the multimodal features. This phenomenon reduces model accuracy by obscuring subtle differences between text and images crucial for identifying fake news. Uncorrelated semantics also reduce the detection accuracy since the identification often relies on these subtle differences. To address these challenges, we propose a unified Complementary Attention Fusion with an Optimized Deep Neural Network (CAF-ODNN) that captures subtle cross-modal relationships for multimodal fake news detection. CAF introduces image captioning to represent images semantically, allowing bidirectional complementary attention between modalities based on a scaled dot product to learn fine-grained correlations. A dedicated alignment and normalization component is incorporated to calibrate fused representations based on channel statistics, ensuring the semantics are preserved across modalities during the interaction, thus improving upon the simple concatenation used in existing fusion approaches. To improve feature extraction, an Optimized Deep Neural Network (ODNN) is implemented that exploits compositional learning. ODNN is designed with three fully connected layers to learn higher-level representations from CAF-fused features. Model parameters are then systematically tuned beyond standard random search techniques to identify configurations, maximizing feature quality and detection accuracy. Our proposed method outperforms comparable approaches on standard metrics on four real-world datasets, highlighting the importance of complementary attention fusion with optimization in identifying fake news.