面向PyTorch的RVV优化
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:


RVV Optimization for PyTorch
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    RISC-V软件生态正在加速发展, 国际开源社区积极投入RISC-V软件生态, 针对RISC-V主动适配和优化, 积极推动RISC-V软件生态系统向前发展. PyTorch是一个开源的Python机器学习库, 其在性能、开源生态、研究领域都有非常大的优势, 其对x86、ARM、PowerPC以及CUDA等指令集架构都提供了较好的支持. 但是, 在目前的RISC-V架构上, 软件生态移植集中在对RISC-V标准指令集的适配, 尚不能充分利用RISC-V扩展指令集优化软件生态, 距离ARM、x86等成熟软件生态存在较大差距. PyTorch因缺少RISC-V V扩展(RVV)的支持, 使得RISC-V平台的推理性能与同规格ARM平台差距较大. 针对上述问题, 本文提出了一种面向PyTorch RVV 1.0的高效开发方案, 并使用RVV扩展指令集对PyTorch深度卷积算子进行针对性优化, 并在K230开发板上进行了对比分析, 实验结果表明, 相比标量实现, 利用RVV优化的深度卷积算子性能提升约1.35–3.8倍.

    Abstract:

    The RISC-V software ecosystem is in the stage of accelerated development. International open-source community makes active contributions with focus on adaptation and optimization for RISC-V, driving its software ecosystem forward. PyTorch, an open-source Python machine learning library, has significant advantages in performance, open-source ecosystem, and research areas. It provides strong support for instruction set architectures such as x86, ARM, PowerPC, and CUDA. However, in the current RISC-V architecture, the software ecosystem porting is mainly focused on adapting to the RISC-V standard instruction set and has not yet fully utilized the RISC-V extended instruction sets to optimize the software ecosystem, which leaves a significant gap between the RISC-V software ecosystem and the mature ecosystems like ARM and x86. PyTorch, lacking support of RISC-V V extension (RVV), results in a considerable gap in inference performance between RISC-V platforms and ARM platforms of similar specifications. To address this issue, this study proposes an efficient development scheme for PyTorch RVV1.0 and optimizes deep convolution operators in PyTorch by using the RVV extended instruction set. A comparative analysis is conducted on the K230 development board, with experimental results showing that the performance of deep convolution operators optimized with RVV is improved by approximately 1.35 to 3.8 times compared to scalar implementations.

    参考文献
    相似文献
    引证文献
引用本文

王凡,张飞,宋甫元,于佳耕.面向PyTorch的RVV优化.计算机系统应用,,():1-10

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-10-14
  • 最后修改日期:2024-10-21
  • 录用日期:
  • 在线发布日期: 2025-02-18
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号