Abstract:In most cases, compared to computing time, memory access time takes a much larger proportion of program running time. Therefore, memory access approach can affect the program performance significantly. Testing results show that the performance of ATLAS transplanted on KD-50-I, which is based on Loongson 2F,reaches only 30% of its theoretical peak. In this paper, by exploiting Loop Unrolling technique to decrease memory access frequency, enhancing time and space locality to reduce cache misses and nonblocking cache mechanism to form memory access pipeline, the performance of optimized ATLAS can be improved to 50% higher.