Abstract:The digital signal processor (Digital Signal Processing, DSP) is widely used in the field of signal processing, digital communication. The majority of modern high-performance DSP use long instruction word architecture, by exploiting instruction-level parallelism to launch multiple instructions at the same clock cycle out for a higher level of calculating performance. The article describes target system characteristics on BWDSP104x, BWDSP104x is designed in the light of high performance computing and processor, uses 16 launch, single instruction stream and multiple data stream architecture.in order to make full use of multi-cluster hardware resources, this paper proposes the back-end optimization about software pipelining based on the open-source compiler named Open64. Including the early stage of cycle options, resource constraints and precedence constraints computing, the classic Module-Scheduling algorithm is used in SWP-Scheduling, module variable expansion is for the conflict of registers using in different iteration. The experimental results show that program has better performance after software pipelining optimization.