###
:2019,28(9):25-32
本文二维码信息
码上扫一扫!
格点量子色动力学组态产生和胶球测量的大规模并行及性能优化
(1.中国科学院 计算机网络信息中心, 北京 100190;2.中国科学院大学, 北京 100049;3.中国科学院 高能物理研究所, 北京 100049)
Performance Optimizing for Large-Scale Lattice Quantum Chromodynamics of Configuration Generating and Glueball Measurement
(1.Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;2.University of Chinese Academy of Sciences, Beijing 100049, China;3.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 109次   下载 86
投稿时间:2019-02-21    修订日期:2019-03-08
中文摘要: 格点量子色动力学(Lattice Quantum Chromo Dynamics,LQCD)是目前已知能系统研究夸克及胶子间低能强相互作用的非微扰计算方法.计算结果的统计和系统误差原则上都是可控的,并能逐步减少.基于格点QCD的基本原理,更大的格子体积意味着可以计算更大空间的物理过程,并且可以对空间进行更加精细的划分,从而得到更加精确的结果.因而大体系的格点计算对QCD理论研究有着重要意义,但对程序计算性能提出了更高要求.本文针对格点QCD组态生成和胶球测量的基本程序,进行了其大规模并行分析和性能优化的研究.基于格点QCD模拟采用的blocking和even-odd算法,我们设计了基于MPI和OpenMP的并行化算法,同时设计优化数据通信模块:针对复矩阵的矩阵乘等数值计算,提出了向量化的计算优化方法:针对组态文件输出瓶颈,提出了并行输出组态文件的实施方法.模拟程序分别在Intel KNL和“天河2号”超级计算机x86_64队列进行了测试分析,证实了相应的优化措施的有效性,并进行了相应的并行计算效率分析,最大测试规模达到了1728个节点(即41472 CPU核).
Abstract:Lattice Quantum Chromo Dynamics (LQCD) is a non-perturbative method for the study of low-energy strong interactions between quarks and gluons. The statistical and systematic uncertainties of the results from LQCD are in principle all under control and can be reduced steadily. Based on LQCD theory, larger volume of lattice grids can calculate physical processes in larger space. And one can divide the space more meticulously to obtain more accurate results. Therefore, large system LQCD calculation is of great significance to the study of QCD theory, but is demanding for higher program computing performance. In this work, the large-scale parallel analysis and performance optimization of LQCD configuration generating and glueball measurement program are studied. Based on the blocking and even-odd algorithms used in LQCD simulation, we design a parallel algorithm based on MPI and OpenMP, and design an optimized data communication module. Aiming at the bottleneck of configuration file output, the solution of configuration file parallel output is put forward. The simulation programs are tested and analyzed on an Intel KNL platform and the x86_64 queues of “Tianhe 2” supercomputer. The results verify the effectiveness of the corresponding optimization measures, and the efficiency of parallel simulation is also analyzed. The maximum size of the test is 1728 nodes (i.e. 41 472 CPU cores).
文章编号:     中图分类号:    文献标志码:
基金项目:国家重点研发计划(2017YFB0203202);国家自然科学基金面上项目(11575197)
引用文本:
田英齐,毕玉江,贺雨晴,马运恒,刘朝峰,徐顺.格点量子色动力学组态产生和胶球测量的大规模并行及性能优化.计算机系统应用,2019,28(9):25-32
TIAN Ying-Qi,BI Yu-Jiang,HE Yu-Qing,MA Yun-Heng,LIU Zhao-Feng,XU Shun.Performance Optimizing for Large-Scale Lattice Quantum Chromodynamics of Configuration Generating and Glueball Measurement.COMPUTER SYSTEMS APPLICATIONS,2019,28(9):25-32

用微信扫一扫

用微信扫一扫