Implementing artificial intelligence (AI) in the Internet of Things (IoT) involves a move from the cloud to the heterogeneous and low-power edge, following an urgent demand for deploying complex training tasks in a distributed and reliable manner. This work proposes a self-aware distributed deep learning (DDL) framework for IoT applications, which is applicable to heterogeneous edge devices aiming to improve adaptivity and amortize the training cost. The self-aware design including the dynamic self-organizing approach and the self-healing method enhances the system reliability and resilience. Three typical edge devices are adopted with cross-platform Docker deployment: Personal Computers (PC) for general computing devices, Raspberry Pi 4Bs (Rpi) for resource-constrained edge devices, and Jetson Nanos (Jts) for AI-enabled edge devices. Benchmarked with ResNet-32 on CIFAR-10, the training efficiency of tested distributed clusters is increased by 8.44× compared to the standalone Rpi. The cluster with 11 heterogeneous edge devices achieves a training efficiency of 200.4 images/s and an accuracy of 92.45%. Results prove that the self-organizing approach functions well with dynamic changes like devices being removed or added. The self-healing method is evaluated with various stabilities, cluster scales, and breakdown cases, testifying that the reliability can be largely enhanced for extensively distributed deployments. The proposed DDL framework shows excellent performance for training implementation with heterogeneous edge devices in IoT applications with high-degree scalability and reliability.