For a markerless augmented reality system that can operate all day, the authors implemented a low-power Basic On-Chip Network-Augmented Reality (BONE-AR) processor to execute object recognition, camera pose estimation, and 3D graphics rendering in real time for an HD resolution video input. BONE-AR employs six clusters of heterogeneous SIMD processors distributed on the mesh topology network on a chip (NoC) to exploit data- and task-level parallelism. A visual attention algorithm reduces overall workload by removing background clutters from the input video frames, but also incurs NoC congestion because of a dynamically fluctuating workload. The authors propose a congestion-aware scheduler that detects and resolves the NoC congestion to prevent throughput degradation of a task-level pipeline.