###
计算机系统应用英文版:2023,32(2):400-405
本文二维码信息
码上扫一扫!
基于申威1621的高精度点积算法实现与优化
(中原工学院 前沿信息技术研究院, 郑州 450007)
Implementation and Optimization of High-precision Dot Product Algorithm Based on SW1621 Processor
(Research Institute of Frontier Information Technology, Zhongyuan University of Technology, Zhengzhou 450007, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 508次   下载 1540
Received:June 20, 2022    Revised:July 18, 2022
中文摘要: 点积函数是BLAS库中的一级基础函数, 其被科学计算等领域广泛调用. 由于浮点计算会引入舍入误差, 现有BLAS库中双精度点积函数不足以满足某些应用领域的精度要求, 因此需要高精度算法来实现更精确可靠的计算. 在本文中, 面向国产申威1621平台, 在现有的BLAS库的基础上, 新增高精度点积函数的实现接口, 来满足应用的高精度需求. 同时, 对于高精度点积算法运用循环展开、访存优化、指令重排等优化策略, 实现汇编级手工优化. 实验结果显示, 文中高精度点积算法的计算结果精度, 近似达到了双精度点积的两倍, 有效提升了原始算法精度. 同时, 在保证精度提升的基础上, 文中优化后的高精度点积函数相比未优化前, 平均性能加速比达到了1.61.
中文关键词: 申威1621  点积  高精度  BLAS库接口  性能优化
Abstract:The dot product function is a first-level basic function in the BLAS library, which is widely called by scientific calculations and other fields. As the floating-point calculation introduces rounding errors, the double-precision dot product is unable to meet the accuracy requirements in some application fields, and thus high-precision algorithms are needed to achieve more accurate and reliable calculations. In this study, on the basis of the existing BLAS library, the interface of the high-precision dot product function is added to meet the high-precision requirements of applications on the domestic SW1621 platform. At the same time, the high-precision dot product algorithm uses such optimization strategies as loop expansion, visit-memory optimization, and instruction rearrangement to realize assembly-level manual optimization. The experimental results indicate that the high-precision dot product algorithm has the accuracy approximately twice that of the double-precision dot product, which effectively improves the precision of the original algorithm. On this basis, the average performance speedup of the high-precision dot product function reaches 1.61 after optimization.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
徐方洁,王磊,王一卓,张亚光.基于申威1621的高精度点积算法实现与优化.计算机系统应用,2023,32(2):400-405
XU Fang-Jie,WANG Lei,WANG Yi-Zhuo,ZHANG Ya-Guang.Implementation and Optimization of High-precision Dot Product Algorithm Based on SW1621 Processor.COMPUTER SYSTEMS APPLICATIONS,2023,32(2):400-405