###
DOI:
计算机系统应用英文版:2012,21(11):63-67
本文二维码信息
码上扫一扫!
基于龙芯3A 的LAPACK 函数优化
张斌1,2,3, 顾乃杰1,2,3, 何颂颂1,2,3, 刘斌斌1,2,3
(1.中国科学技术大学 计算机科学技术学院, 合肥 230027;2.安徽省计算与通信软件重点实验室, 合肥 230027;3.中国科学技术大学 中科院沈阳计算所网络与通信联合实验室, 合肥 230027)
Optimization of LAPACK Based on Loongson 3A
ZHANG Bin1,2,3, GU Nai-Jie1,2,3, HE Song-Song1,2,3, LIU Bin-Bin1,2,3
(1.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China;2.Anhui Province Key Laboratory of Computing and Communication Software, Hefei 230027, China;3.USTC & SICT Network and Communication Joint Laboratory)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1874次   下载 3137
Received:March 27, 2012    Revised:May 18, 2012
中文摘要: 针对龙芯3A 体系结构, 通过底层BLAS 库的优化、LAPACK 分块算法中分块大小的改善以及LAPACK函数的单独优化这三种途径来提升LAPACK 函数的性能. 用LAPACK 自带的性能测试程序进行测试, 实验结果表明, 有240 个LAPACK 函数的性能提升达到30%以上, 占全部性能测试函数的81%.
中文关键词: LAPACK  BLAS  龙芯3A  优化  双单精度
Abstract:According to the characteristics of Loongson 3A architecture, this paper shows three ways to improve the performance of LAPACK: optimization of the underlying BLAS library, the selection of the best block size of the block algorithm in LAPACK and optimization of the specific LAPACK functions. By running the LAPACK Timing Programs, experimental results are obtained and it shows that the performance of 240 LAPACK functions, which account for 81% of all the LAPACK Timing Programs, is increased by more than 30%.
文章编号:     中图分类号:    文献标志码:
基金项目:国家“核高基”重大专项(2009ZX01028-002-003-005);国家自然科学基金(60833004);高等学校学科创新引智计划(B07033)
引用文本:
张斌,顾乃杰,何颂颂,刘斌斌.基于龙芯3A 的LAPACK 函数优化.计算机系统应用,2012,21(11):63-67
ZHANG Bin,GU Nai-Jie,HE Song-Song,LIU Bin-Bin.Optimization of LAPACK Based on Loongson 3A.COMPUTER SYSTEMS APPLICATIONS,2012,21(11):63-67