﻿ 基于分治法求解对称三对角矩阵特征问题的混合并行实现
 计算机系统应用  2019, Vol. 28 Issue (9): 246-250 PDF

1. 中国科学院 计算机网络信息中心, 北京 100190;
2. 中国科学院大学, 北京 100049

Hybrid Parallel Algorithm Using MPI/Cilk for Symmetric Tridiagonal Eigenproblems
ZHU Jing-Qiao1,2, ZHAO Yong-Hua1
1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China
Foundation item: National Key Research and Development Program of China (2017YFB0202202, 2016YFB0201302); CAS Special Fund for Informatization Construction in 13th Five-Year Plan (XXH13506-405)
Abstract: Divide and conquer algorithm is widely used for tridiagonal matrix eigenproblems while computing efficiency and storage limitation are always bottlenecks for large scale problems. In this study, the proposed eigenproblem algorithm based on hybrid parallel paradigm with MPI/Cilk optimizes the divide and conquer algorithm both at data and task levels. The introduced task-based parallelization mechanism inside computing nodes solves the problem in data dependence and thread starvation by directed acyclic graph model. By coarse-grained partition of tasks the overhead of data communication among MPI nodes is also optimized, which helps to improve load balance. The numerical test is carried out and the result is compared with the pure MPI and MPI/openMP parallel algorithm, which shows the performance and efficiency of the algorithm.
Key words: parallel computing     symmetric eigenproblem     DC method     Cilk

Cilk作为一个基于任务的并行编程模型, 为共享内存系统提供了一个高效的任务窃取调度器, 适合用来为分治和递归类算法实现高效的多核任务并行. 对于并行编程, Cilk为串行程序的并行化提供了向量化、锁、视图等机制[6], 通过关键字来声明操作, 改造后的代码具有语义化、可读性强的特点.

1 分治算法求解的基本原理 1.1 矩阵划分

 $\begin{split} T &= \left( {\begin{array}{*{20}{c}} {{T_1}}&{\beta {e_k}e_1^{\rm{T}}} \\ {\beta {e_1}e_k^{\rm{T}}}&{{T_2}} \end{array}} \right) \\ &= \left( {\begin{array}{*{20}{c}} {{{T'}_1}}&0 \\ 0&{{{T'}_2}} \end{array}} \right) + \beta \left( {\begin{array}{*{20}{c}} {{e_k}} \\ {{e_1}} \end{array}} \right)\left( {\begin{array}{*{20}{c}} {e_k^{\rm{T}}}&{e_1^{\rm{T}}} \end{array}} \right) \\ \end{split}$ (1)

1.2 分治法求解过程

 ${T'_1} = {Q_1}{D_!}{Q_1}^{\rm{T}},{T'_2} = {Q_2}{D_2}{Q_2}^{\rm{T}}$ (2)

 $z = diag{\left( {{Q_1},{Q_2}} \right)^{\rm{T}}}\left( {\begin{array}{*{20}{c}} {{e_k}} \\ {{e_1}} \end{array}} \right),$ (3)

 $\begin{split} T &= \left( {\begin{array}{*{20}{c}} {{Q_1}}&0\\ 0&{{Q_2}} \end{array}} \right)\left[ {\left( {\begin{array}{*{20}{c}} {{D_1}}&{}\\ {}&{{D_2}} \end{array}} \right) + \beta z{z^{\rm{T}}}} \right]{\left( {\begin{array}{*{20}{c}} {{Q_1}}&0\\ 0&{{Q_2}} \end{array}} \right)^{\rm{T}}}\!\!\!\!\!\!\!\!\!\\ &= Q\left( {D + \beta z{z^{\rm{T}}}} \right){Q^{\rm{T}}}. \end{split}$ (4)

D+βzzT=W, 因为diag(Q1, Q2)为正交矩阵, 所以TW相似有相同的特征值, 求T的特征分解T=QΛQT, 可以通过先求出W的特征分解: W=PΛPT, 再令Q=diag(Q1, Q2)P, 即可求得T的特征分解.

 $1 + \beta \sum\limits_i {\frac{{x_i^2}}{{{d_i} - \lambda }}} = 0,{d_i} = {D_{ii}}$ (5)

1.3 注意事项

2 分治算法的并行化 2.1 节点内并行实现

1) 判断输入矩阵的规模, 如果矩阵规模小于等于设定 阈值, 执行2); 否则执行3);

2) 调用特征求解算法进行特征分解, 返回结果;

3) 划分矩阵T得子矩阵T1, T2, 通过Cilk执行任务划分, 并行地对T1, T2递归调用步骤1), 求出子矩阵的特征值和特征向量后, 进行排序和迭代逼近等操作合并子矩阵的特征值和特征向量, 得到T的特征分解, 返回结果.

Cilk在程序递归执行过程中不断划分新的任务, 产生Cilk线程去处理, 并把其分配到空闲的核上, 和父线程并行执行. 一个Cilk任务是在同步、生成新线程或返回结果之前执行的最长序列化指令集或代码段[10]. 求解过程中, 递归调用T1之前的过程可看作任务A, 递归调用T2的过程可看作任务B, 同步及合并过程可看作任务C. 现假设程序只进行了四次嵌套调用, 则全局任务划分后生成的任务流如图1所示.

2.2 多节点混合并行实现

 图 1 分治算法的任务并行流程

1) 组内进程通过互补通信交换求得的局部特征向量矩阵, 接着向主控进程发送本进程求得的局部特征值和拼接的修正向量;

2) 主控进程收集所有组内进程求得的局部特征值进行排序, 并保留排序后的位置索引数组, 并根据此数组将修正向量排序, 将排好序的特征值和修正向量划分到组内各进程;

3) 组内进程接收主控进程发送过来的排好序的部分特征值和修正向量, 求解secular方程和特征向量;

4) 各组内进程按列执行分块矩阵乘法, 更新当前轮循环的部分特征向量矩阵, 接着执行下一轮循环.

2.3 算法分析

3 实验结果与分析

 图 2 MPI模型和MPI/Cilk模型的加速比曲线

4 结论与展望

 [1] Cuppen JJM. A divide and conquer method for the symmetric tridiagonal eigenproblem. Numerische Mathematik, 1980, 36(2): 177-195. DOI:10.1007/BF01396757 [2] Gu M, Eisenstat SC. A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. SIAM Journal on Matrix Analysis and Applications, 1994, 15(4): 1266-1276. DOI:10.1137/S089547989223924X [3] Tisseur F, Dongarra J. A parallel divide and conquer algorithm for the symmetric eigenvalue problem on distributed memory architectures. SIAM Journal on Scientific Computing, 1998, 20(6): 2223-2236. [4] Zhao YH, Chen J, Chi XB. Solving the symmetric tridiagonal eigenproblem using MPI/OpenMP hybrid parallelization. Cao J, Nejdl W, Xu M. Lecture Notes in Computer Science. Berlin: Springer, 2005. 3756. 164–173. [5] Ammar GS, Reichel L, Sorensen DC. An implementation of a divide and conquer algorithm for the unitary eigen problem. ACM Transactions on Mathematical Software, 1992, 18(3): 292-307. DOI:10.1145/131766.131770 [6] Intel Cilk Plus. Intel Cilk reference manual and documentation, https://www.cilkplus.org/cilk-plus-tutoriall. [7] Melman A. Numerical solution of a secular equation. Numerische Mathematik, 1995, 69(4): 483-493. DOI:10.1007/s002110050104 [8] Dhillon IS, Parlett BN. Orthogonal eigenvectors and relative gaps. SIAM Journal on Matrix Analysis and Applications, 2003, 25(3): 858-899. DOI:10.1137/S0895479800370111 [9] Dhillon IS, Parlett BN. Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices. Linear Algebra and its Applications, 2004, 387: 1-28. DOI:10.1016/j.laa.2003.12.028 [10] Supercomputing Technologies Group, MIT Laboratory for Computer Science. Cilk-5.4.6 reference manual. Cambridge, Massachusetts, USA: Massachusetts Institute of Technology. http://supertech.csail.mit.edu/cilk/manual-5.4.6.pdf.