基于RISC-V异构平台的大语言模型推理加速

doi:10.15888/j.cnki.csa.010067

AIPUB归智期刊联盟

微信公众号

网站二维码

首页 > 过刊浏览>2025年第34卷第12期 >16-25. DOI:10.15888/j.cnki.csa.010067

PDF HTML阅读 XML下载导出引用引用提醒

基于RISC-V异构平台的大语言模型推理加速
DOI:
                        10.15888/j.cnki.csa.010067
                    
CSTR:
                        32024.14.csa.010067
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家重点研发计划(2023YFB4503902)

Inference Acceleration for Large Language Models on RISC-V Heterogeneous Platform

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

随着大语言模型在各类生成任务中的广泛应用, 其高计算负载对底层硬件平台提出了更高的性能要求. RISC-V 作为一种新兴的开源指令集架构, 凭借其良好的可定制性和扩展性, 展现出巨大的发展潜力. 然而在部署主流大模型方面, RISC-V平台仍面临生态不完善、算力受限等诸多挑战. 本文提出一种基于RISC-V平台的大语言模型推理加速方法, 通过构建寒武纪MLU370加速卡的异构运行环境, 成功完成了设备驱动移植、基础库编译与PyTorch框架适配. 在此基础上, 进一步设计了一种轻量级多线程优化策略, 提升注意力机制等核心算子在多核体系结构下的执行效率. 实验结果表明, 在SG2042+MLU370-S4平台上部署多个主流大模型时, 该方法在不依赖其他优化策略下, 实现最高达52.3倍的端到端推理加速, 验证了其在RISC-V异构平台上的可行性与通用性.

Abstract:

With the widespread deployment of large language models (LLMs) across various generative tasks, their high computational demands impose stringent performance requirements on the underlying hardware. RISC-V, an emerging open-source instruction-set architecture, shows great potential owing to its excellent customizability and extensibility. Nevertheless, when deploying mainstream LLMs, the RISC-V ecosystem still faces challenges such as an incomplete software stack and limited compute capability. This study proposes an inference acceleration method for LLMs on RISC-V heterogeneous platforms. By establishing a heterogeneous runtime environment that integrates the Cambricon MLU370 accelerator, the device driver is ported, essential libraries are compiled, and the PyTorch framework is adapted. Building on this foundation, a lightweight multi-threading optimization scheme is further designed to improve the efficiency of core operators—especially the attention mechanism—on multi-core architectures. Experimental results on the SG2042+ MLU370-S4 platform show that, without relying on any additional optimizations, the proposed method achieves up to 52.3 times end-to-end inference speedup for several mainstream LLMs, thus demonstrating both the feasibility and broad applicability of the approach on RISC-V heterogeneous systems.

参考文献

相似文献

引证文献

引用本文

沈郑东,刘雨冬,于佳耕,田青.基于RISC-V异构平台的大语言模型推理加速.计算机系统应用,2025,34(12):16-25

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-04-30
最后修改日期:2025-06-24
录用日期:
在线发布日期: 2025-11-04
出版日期:

微信公众号

网站二维码

引用本文

分享

相关视频

文章指标

历史

文章二维码