For all day long operable markerless augmented reality system, low-power BONE-AR processor is implemented to execute object recognition, camera pose estimation, and 3D graphics rendering in real-time for a HD resolution video input. BONE-AR adopts 6 clusters of heterogeneous SIMD processors distributed on the mesh topology network-on-chip (NoC) to exploit data-level parallelism and task-level parallelism. Visual attention algorithm reduces overall workload by removing background clutters from the input video frames, but also incurs NoC congestion due to dynamically fluctuating workload. We propose a congestion-aware scheduler (CAS) that detects and resolves the NoC congestion to prevent throughput degradation of task-level pipeline.