本文已被:浏览 2195次 下载 3627次
Received:March 27, 2012 Revised:May 18, 2012
Received:March 27, 2012 Revised:May 18, 2012
中文摘要: 针对龙芯3A 体系结构, 通过底层BLAS 库的优化、LAPACK 分块算法中分块大小的改善以及LAPACK函数的单独优化这三种途径来提升LAPACK 函数的性能. 用LAPACK 自带的性能测试程序进行测试, 实验结果表明, 有240 个LAPACK 函数的性能提升达到30%以上, 占全部性能测试函数的81%.
Abstract:According to the characteristics of Loongson 3A architecture, this paper shows three ways to improve the performance of LAPACK: optimization of the underlying BLAS library, the selection of the best block size of the block algorithm in LAPACK and optimization of the specific LAPACK functions. By running the LAPACK Timing Programs, experimental results are obtained and it shows that the performance of 240 LAPACK functions, which account for 81% of all the LAPACK Timing Programs, is increased by more than 30%.
keywords: LAPACK BLAS Loongson 3A optimization paired single
文章编号: 中图分类号: 文献标志码:
基金项目:国家“核高基”重大专项(2009ZX01028-002-003-005);国家自然科学基金(60833004);高等学校学科创新引智计划(B07033)
引用文本:
张斌,顾乃杰,何颂颂,刘斌斌.基于龙芯3A 的LAPACK 函数优化.计算机系统应用,2012,21(11):63-67
ZHANG Bin,GU Nai-Jie,HE Song-Song,LIU Bin-Bin.Optimization of LAPACK Based on Loongson 3A.COMPUTER SYSTEMS APPLICATIONS,2012,21(11):63-67
张斌,顾乃杰,何颂颂,刘斌斌.基于龙芯3A 的LAPACK 函数优化.计算机系统应用,2012,21(11):63-67
ZHANG Bin,GU Nai-Jie,HE Song-Song,LIU Bin-Bin.Optimization of LAPACK Based on Loongson 3A.COMPUTER SYSTEMS APPLICATIONS,2012,21(11):63-67