###
计算机系统应用英文版:2017,26(11):101-108
本文二维码信息
码上扫一扫!
基于申威众核处理器的1、2级BLAS函数优化研究
(1.中国科学院 软件研究所, 北京 100190;2.中国科学院大学, 北京 100049)
Research on the Optimization of BLAS Level 1 and 2 Functions on Shenwei Many-Core Processor
(1.Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;2.University of Chinese Academy of Sciences, Beijing 100049, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1845次   下载 2698
Received:February 21, 2017    Revised:March 09, 2017
中文摘要: BLAS (Basic Linear Algebra Subprograms)是一个以向量和矩阵为操作对象的基础函数库.该库中函数分为3个级别,各个级别分别提供了向量-向量(1级)、向量-矩阵(2级)、矩阵-矩阵(3级)之间的基本运算.本文研究如何在申威众核处理器上BLAS-1、2级函数的并行实现,并充分利用平台特性对它们进行深度的性能调优,归纳总结程序在申威平台上的并行实现与优化技巧.申威26010 CPU采用了异构众核架构,众多计算核心提供的大规模并行处理能力,使单块芯片具有3 TFLOPS的双精度浮点计算性能.实验结果显示BLAS-1、2级函数相对于GotoBLAS参考实现版的平均加速比分别高达11.x和6.x,对于每一优化手段,均有明显的性能加速.
中文关键词: BLAS  异构众核  任务并行  simd向量化
Abstract:BLAS (Basic Linear Algebra Subprograms) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. The functions in this library are divided into three levels, and each level provides basic operations between vector-vector (level 1), vector-matrix (level 2), and matrix-matrix (level 3), respectively. In this paper, we study the parallel implementation of BLAS level 1 and level 2 functions on Shenwei many-core processor, and make full use of the characteristics of the platform to optimize their performance, and sum up the parallel implementation and optimization techniques of the program on Shenwei platform. Shenwei 26010 CPU uses heterogeneous multi-core architecture, and has an obvious advantage in operating speed. Many computing cores provide large-scale parallel processing capabilities, so that, double precision floating-point computing performance of one single chip can reach 3TFLOPS. The experimental results show that the average speedup of BLAS level 1 and level 2 functions is as high as 11.x and 6.x times of GotoBLAS reference implementations respectively.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金重大研究计划集成项目(91530323);国家高技术研究发展计划(863计划)(2015AA01A302)
引用文本:
孙家栋,孙乔,邓攀,杨超.基于申威众核处理器的1、2级BLAS函数优化研究.计算机系统应用,2017,26(11):101-108
SUN Jia-Dong,SUN Qiao,DENG Pan,YANG Chao.Research on the Optimization of BLAS Level 1 and 2 Functions on Shenwei Many-Core Processor.COMPUTER SYSTEMS APPLICATIONS,2017,26(11):101-108