Abstract:The delayed access to indirect memory often affects the execution performance of applications. An effective solution is to resort to the prefetching technology. Although the Shenwei platform developed in China supports the software and hardware prefetching mechanisms for conventional access modes, the compilers in its GNU compiler collection (GCC) lack the method of automatically inserting prefetches for indirect memory access. A complete indirect prefetching optimization pass is developed on the basis of the Shenwei GCC to solve this problem, and it uses a depth-first search algorithm to find indirect memory references that refer to loop induction variables and generate appropriate software prefetches for them. In a set of memory-bound benchmark tests, the average speed-up ratio of the automatic prefetching pass on the SW1621 processor reaches 1.16 times.