大语言模型驱动的碳知识库构建与应用

doi:10.15888/j.cnki.csa.010017

AIPUB归智期刊联盟

微信公众号

网站二维码

首页 > 过刊浏览>2025年第34卷第12期 >75-88. DOI:10.15888/j.cnki.csa.010017

PDF HTML阅读 XML下载导出引用引用提醒

大语言模型驱动的碳知识库构建与应用
DOI:
                        10.15888/j.cnki.csa.010017
                    
CSTR:
                        32024.14.csa.010017
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Construction and Application of Carbon Knowledge Base Driven by Large Language Model

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

大语言模型(large language model, LLM)在自然语言理解与生成领域展现出卓越能力, 但在特定领域知识密集型任务中仍面临事实准确性不足、知识更新难以及高质量领域数据集匮乏的问题. 在应对上述难题时, 检索增强生成(retrieval-augmented generation, RAG)技术脱颖而出, 成为行之有效的解决路径. 然而, 在应对碳领域的知识密集型任务时, RAG技术还存在查询理解环节容易出现偏差、外部知识检索策略僵化单一、检索得到的结果与实际需求的相关性较差等短板, 同时缺乏特定的数据集来评估问答效果. 针对以上问题, 提出基于多管道的检索增强生成(Multi-pipeline-based RAG)方法, 使用本文提出的图谱增强递归式智能合并检索, 有效提升了检索精确率; 针对特定领域问答数据集的缺乏, 提出基于父节点文本的大模型自动生成问答数据集方法. 同时在传统评估指标, 如精确率(Precision)、召回率(Recall)等基础上, 利用LLM的文本理解能力评估: (1)响应-上下文-查询相关性评估; (2)响应-查询相关性评估; (3)上下文-查询相关性评估; (4)忠诚性评估. 通过与BM25-based RAG、Vector-based RAG、Recursive-based RAG的对比实验, 基于GLM-4-Plus模型的Multi-pipeline-based RAG精确率达到了85%, 高于其他方法.

Abstract:

The large language model (LLM) demonstrates excellent capabilities in natural language understanding and generation. However, it still faces challenges such as insufficient factual accuracy, difficulties in knowledge updating, and a lack of high-quality domain-specific datasets in knowledge-intensive tasks. To address these challenges, retrieval-augmented generation (RAG) has emerged as an effective solution. However, when applied to knowledge-intensive tasks in the carbon domain, RAG technology has limitations, including potential bias in query understanding, rigid external knowledge retrieval strategies, poor correlation between retrieved results and actual needs, and a lack of specific datasets for evaluating question-answering performance. To tackle these issues, this study proposes a Multi-pipeline-based RAG method, which utilizes the graph-enhanced recursive intelligent merge retrieval method to effectively improve retrieval accuracy. For the lack of Q&A datasets in specific domains, a large model-based approach is proposed to automatically generate Q&A datasets from the parent node text. Moreover, this study evaluates the following aspects using the text understanding capability of LMM, alongside traditional evaluation metrics such as precision and recall: (1) response-context-query correlation; (2) response-query correlation; (3) context-query correlation, and (4) loyalty evaluation. Experimental results show that the Multi-pipeline-based RAG method based on the GLM-4-Plus model achieves an accuracy of 85%, outperforming BM25-based RAG, Vector-based RAG, and Recursive-based RAG methods.

参考文献

相似文献

引证文献

引用本文

芦成飞.大语言模型驱动的碳知识库构建与应用.计算机系统应用,2025,34(12):75-88

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-05-12
最后修改日期:2025-06-05
录用日期:
在线发布日期: 2025-10-21
出版日期:

微信公众号

网站二维码

引用本文

分享

相关视频

文章指标

历史

文章二维码