###
计算机系统应用英文版:2022,31(11):358-364
本文二维码信息
码上扫一扫!
格点QCD基础求解器及其异构计算实现的性能优化
(1.中国科学院 计算机网络信息中心, 北京 100190;2.中国科学院大学, 北京 100049;3.南京师范大学, 南京 210023)
Performance Optimization of Lattice QCD Solver and Its Heterogeneous Computation
(1.Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;2.University of Chinese Academy of Sciences, Beijing 100049, China;3.Nanjing Normal University, Nanjing 210023, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 896次   下载 1559
Received:January 29, 2022    Revised:March 16, 2022
中文摘要: 格点量子色动力学(格点QCD)是研究夸克、胶子等微观粒子间相互作用的重要理论和方法. 通过将时空离散化为四维结构网格, 并将量子色动力学的基本场量定义在网格上, 让研究人员可以使用数值模拟方法, 从第一性原理出发研究强子间相互作用和性质, 但这个过程中的计算量极大, 需要进行大规模并行计算. 格点QCD计算的核心基础为格点QCD求解器, 是程序运行主要的计算热点模块. 本文研究在国产异构计算平台下格点QCD求解器的实现与优化, 提出一套格点QCD求解器的设计实现, 实现了BiCGSTAB求解器, 显著降低了迭代次数; 通过对奇偶预处理技术, 降低了所求问题的计算规模; 针对国产异构加速卡的特点, 优化了Dslash模块的访存操作. 实验测试表明, 相比优化前的求解器获得了约30倍的加速比, 为国产异构超算下格点QCD软件性能优化提供了有益的参考价值.
Abstract:Lattice quantum chromodynamics (Lattice QCD) is an important theory and method to study the interaction between microscopic particles such as quarks and gluons. By discretizing the spacetime into a four-dimensional structural grid and defining the basic field quantity of QCD on the grid, researchers can use a numerical simulation method to study hadron interactions and properties from the first principle. However, the computation in this process is time-consuming, and large-scale parallel computing is required. The fundamental module of the Lattice QCD computation is the Lattice QCD solver which is the main hot spot of the program running. This work studies the realization and optimization of Lattice QCD solver from a domestic heterogeneous computing platform and proposes a design method of Lattice QCD solver, which realizes BiCGSTAB solver and significantly reduces the iteration numbers. With the odd/even pre-processing technology, the study reduces the computing scale of the problem and optimizes the Dslash module’s memory access in terms of the characteristics of a domestic heterogeneous accelerator. Experimental tests show that the speedup ratio of the solver is about 30 times higher than that of the unoptimized one, which provides a useful reference for the performance optimization of Lattice QCD software of domestic heterogeneous supercomputers.
文章编号:     中图分类号:    文献标志码:
基金项目:中国科学院 B 类先导培育项目(XDPB25); 海光产业生态合作组织基金(ghfund202107011598)
引用文本:
杨子江,张克龙,刘倩,徐顺,孙鹏.格点QCD基础求解器及其异构计算实现的性能优化.计算机系统应用,2022,31(11):358-364
YANG Zi-Jiang,ZHANG Ke-Long,LIU Qian,XU Shun,SUN Peng.Performance Optimization of Lattice QCD Solver and Its Heterogeneous Computation.COMPUTER SYSTEMS APPLICATIONS,2022,31(11):358-364