With the rapid development of cognitive radio technology, multilayer heterogeneous cognitive radio computing platforms with large computing, high-throughput, ultralarge bandwidth and ultralow latency have become a research hotspot. Aiming at the core scheduling problems of multilayer heterogeneous computing platforms, this paper abstracts the bidirectional interconnection topology, node computing capacity, and internode communication capability of the heterogeneous computing platform into an undirected graph model and abstracts the nodes with dependencies, nodes’ computing requirements, and internode communication requirements in streaming tasks into a directed acyclic graph (DAG) model so as to transform the task-scheduling problem into a deployment-scheduling problem from DAG to undirected graph. To efficiently solve this graph model, this paper calculates and forms a component scheduling sequence based on the dependencies of functional components in streaming domain tasks. Then, according to the scheduling sequence, ant colony optimization (ACO) algorithms, such as ant colonies and Q-learning select functional components, deploy components to different computing nodes, calculate the scheduling cost, guide the solution space search of agents, and complete the scenario migration adaptation of the scheduling algorithms to intelligent scheduling of domain tasks. So, this paper proposes the ACO field task intelligent scheduling algorithm based on Q-learning optimization (QACO). QACO uses the Q-table matrix of Q-learning as the initial pheromone of the ant colony algorithm, which not only solves the dimensional disaster of the Q-learning algorithm but also accelerates the convergence speed of the ant colony intelligent scheduling algorithm, reduces the task scheduling length, and further enhances the search ability of the existing scheduling algorithm to solve the spatial set. Based on the randomly generated DAG domain task map, three experimental test scenarios are designed to verify the algorithm performance. The experimental results show that compared with the Q-learning, ACO, and genetic algorithms (GA) algorithms, the proposed algorithm improves the convergence speed of the solution by 72.3%, 63.4%, and 64% on average, reduces the scheduling length by 2.8%, 2.2%, and 0.9% on average, and increases the parallel acceleration ratio by 2.8%, 2.1%, and 0.9% on average, respectively. The practical application value of the algorithm is analyzed through typical radar task simulation, but the load balancing of the algorithm needs to be further improved.