Abstract:With the increase in vector length, SIMD extension can deal with more huge data level parallelism, but the parallelism threshold of the program also increases. For the current auto-vectorization compiler, if enough data level parallelism can not be found from the scalar code to completely fill the vector register in the analysis stage, it will not enter the vector code transformation stage, and vectorization cannot be achieved. The improvement of vector length makes some programs with insufficient parallelism lose the opportunity of vectorization, resulting in performance degradation. To make full use of SIMD components, this study introduces a basic block oriented insufficient vectorization method ISLP. Based on the GCC compiler, the design and implementation of ISLP are described in detail from three aspects: parallelism detection, code generation and cost model. Experiments on the standard test set show that this method can effectively vectorize the program with insufficient super-word level parallelism and improve the program execution efficiency. The average speedup ratio of the selected test cases after vectorization reaches 1.14, and the performance is 11.8% higher than that of the conventional SLP method.