Human needs motivate the improvement of computing paradigms, the emergence of soft computing is more effective in dealing with daily problems, while reducing costs, it also solves the problem of robustness and becomes easier to handle. Examples of this include collecting data on video data from smart sensors and can use that information to predict and monitor the behavior of crowd. Nowadays, with the development of cyber–physical systems and artificial intelligence, the traditional data collection and analysis system faces the risks of low transparency and high data security, making it difficult to obtain an accurate prediction result. Therefore, in this paper a novel prediction machine via self-learning generative adversarial network for soft computing application is proposed, which collects data through a series of high-precision IoT sensor devices and makes preliminary preprocessing, and further solves the crowd prediction problem based on deep learning algorithms and obtains a reliable and accurate prediction result by continuously optimizing internal parameters. The focus of this work is on the accuracy and preprocessing of data collection and crowd prediction algorithms. The prediction algorithm can be used to estimate and monitor the crowd flow in public places, and can prevent crowding, trampling and other traffic jams, such as stations, airports, large exhibitions, tourist attractions and other places. Therefore, the new prediction machine includes video capture, upload and display, data analysis and early warning operations in embedded devices, and automatically predicts crowd density. In terms of constructing the network, first, in order to obtain a clearer generated density map, the feature self-learning module is merged in the generator feature extraction stage. Secondly, in order to avoid the blur of the generated image, an adversarial loss is constructed between the generator and the discriminator, and finally to deal with multiple scales, two generator networks are constructed to adapt to large scale and small scale respectively in order to extract semantic information of different scales and use cross-scale consistency loss constraints to generate density maps.