In order to reduce the problem of mismatch between high-performance processors and DRAM speeds, current processors have added a cache structure, but the low cache hit rate also seriously affects the actual performance of the program. Data prefetching technology can alleviate the problems of memory access latency and low hit rate caused by the speed difference between high-performance processors and DRAM. Based on the LLVM open source compiler, this article first implements the data prefetch module on the Shenwei platform. This paper improves the prefetch distance algorithm, proposes a new prefetch scheduling algorithm, introduces a cost model to evaluate the prefetch revenue, and accurately determines the insertion timing of the prefetch instruction to improve the cache hit rate. SPEC2006 performance test results show that after optimization, Shenwei 1621 processor single-core can achieve a maximum performance improvement of 50%, and an average performance improvement of 11%.