Abstract:For architecture of BWDSP, this paper analyzes the implementation of the optimizations, based on vectorization and software pipelining of parallel optimization technique, including special memory access instruction, zero-overhead looping instruction of improving efficiency in the loop, Instruction-level Parallelism(ILP), combined with the specific function of loop characteristics, expansion for the string and memory function of instruction level parallel mining. The experimental results show that the optimization rates of most functions of the theory have a running time of 1.5 times on hardware platform, which is of great importance to enhance the platform performance of BWDSP.