基于神经网络和信息检索的源代码注释生成

doi:10.15888/j.cnki.csa.009119

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月14日 14:21 星期一

首页 > 过刊浏览>2023年第32卷第7期 >1-10. DOI:10.15888/j.cnki.csa.009119

PDF HTML阅读 XML下载导出引用引用提醒

基于神经网络和信息检索的源代码注释生成
DOI:
                        10.15888/j.cnki.csa.009119
                    
CSTR:
                        
                    
作者:
                        沈鑫沈鑫
南京航空航天大学 计算机科学与技术学院, 南京 210016;南京航空航天大学 高安全系统的软件开发与验证技术工信部重点实验室, 南京 210016
在期刊界中查找
在百度中查找
在本站中查找
周宇周宇
南京航空航天大学 计算机科学与技术学院, 南京 210016;南京航空航天大学 高安全系统的软件开发与验证技术工信部重点实验室, 南京 210016
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61972197); 江苏省自然科学基金 (BK20201292)

Source Code Summarization Based on Neural Network and Information Retrieval

Author:

SHEN Xin
SHEN Xin
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;Key Laboratory for Safety-critical Software Development and Verification, Ministry of Industry and Information Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
在期刊界中查找
在百度中查找
在本站中查找
ZHOU Yu
ZHOU Yu
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;Key Laboratory for Safety-critical Software Development and Verification, Ministry of Industry and Information Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

源代码注释生成旨在为源代码生成精确的自然语言注释, 帮助开发者更好地理解和维护源代码. 传统的研究方法利用信息检索技术来生成源代码摘要, 从初始源代码选择相应的词或者改写相似代码段的摘要; 最近的研究采用机器翻译的方法, 选择编码器-解码器的神经网络模型生成代码段的摘要. 现有的注释生成方法主要存在两个问题: 一方面, 基于神经网络的方法对于代码段中出现的高频词更加友好, 但是往往会弱化低频词的处理; 另一方面, 编程语言是高度结构化的, 所以不能简单地将源代码作为序列化文本处理, 容易造成上下文结构信息丢失. 因此, 本文为了解决低频词问题提出了基于检索的神经机器翻译方法, 使用训练集中检索到的相似代码段来增强神经网络模型; 为了学习代码段的结构化语义信息, 本文提出结构化引导的Transformer, 该模型通过注意力机制将代码结构信息进行编码. 经过实验, 结果证明该模型在低频词和结构化语义的处理上对比当下前沿的代码注释生成的深度学习模型具有显著的优势.

关键词:代码注释生成;抽象语法树;Transformer;语义相似度;自注意力机制;程序理解

Abstract:

Source code summarization is designed to automatically generate precise summarization for natural language, so as to help developers better understand and maintain source code. Traditional research methods generate source code summaries by using information retrieval techniques, which select corresponding words from the original source code or adapt summaries of similar code snippets; recent research adopts machine translation methods and generates summaries of code snippets by selecting the encoder-decoder neural network model. However, there are two main problems in existing summarization generation methods: on the one hand, the neural network-based method is more friendly to the high-frequency words appearing in the code snippets, but it tends to weaken the processing of low-frequency words; on the other hand, programming languages ??are highly structured, so source code cannot simply be treated as serialized text, or otherwise, it will lead to loss of contextual structure information. Therefore, in order to solve the problem of low-frequency words, a retrieval-based neural machine translation approach is proposed. Similar code snippets retrieved from the training set are used to enhance the neural network model. In addition, to learn the structured semantic information of code snippets, this study proposes a structured-guided Transformer, which encodes structural information of codes through an attention mechanism. The experimental results show that the model has significant advantages over the deep learning model generated by the current cutting-edge code summarization in processing low-frequency words and structured semantics.

Key words:code summarization;abstract syntax tree (AST);Transformer;semantic similarity;self-attention mechanism;programming comprehension

引用本文

沈鑫,周宇.基于神经网络和信息检索的源代码注释生成.计算机系统应用,2023,32(7):1-10

复制

文章指标

点击次数:924
下载次数: 2084
HTML阅读次数: 1426
引用次数: 0

历史

收稿日期:2022-11-05
最后修改日期:2022-12-10
录用日期:
在线发布日期: 2023-05-12
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码