计算机系统应用  2019, Vol. 28 Issue (9): 209-214 PDF

1. 中国科学技术大学 计算机科学与技术学院, 合肥 230026;
2. 安徽省高性能计算重点实验室, 合肥 230026

Java API Sequence Recommendation Method Based on Attention Mechanism
ZHANG Rui-Feng1,2, WANG Peng-Cheng1,2, WU Ming1,2, XU Yun1,2
1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China;
2. Key Laboratory of High Performance Computing of Anhui Province, Hefei 230026, China
Foundation item: General Program of National Natural Science Foundation of China (61672480)
Abstract: It is a difficult process for developers to use API and API sequences (APIs) correctly in software development. When developers are faced with unfamiliar function libraries or code repositories like Github that contains a large number of APIs, they need assistance of some recommendation tools or system. To the best of our knowledge, DeepApi can better understand the semantics of user’s query, but the RNN-based model has some problems: (1) it does not consider the weight of each word, (2) the input sequence is compressed into a fixed length vector, which loses much useful information, (3) long sentences lead to loss of key information. Therefore, this study uses a model based on attention mechanism to distinguish the importance of each word and solve the problem of long-distance dependence caused by long query input. We crawled 649 Java open source projects from Github and processed them to get a training set of 114 364 pairs of annotation-API sequences. The experimental results show that the proposed method can increase BLUE index by more than about 20% compared with DeepApi method on Top1, Top5, and Top10.
Key words: API sequences     recommendation     attention mechanisms     deep learning

 图 1 Stack Overflow网站上的问答实例

1 问题定义及相关工作 1.1 问题定义

1.2 相关工作

1.2.1 代码搜索的研究现状

1.2.2 API序列推荐的研究现状

1.3 目前API序列推荐方法存在的问题

SWIM[4]训练的统计词对齐模型是基于词袋模型的, 而没有考虑到API的单词序列以及位置关系, 例如: 它很难区分查询语句“convert date to string”与“convert string to date”. 而之后Gu等人[15]提出的DeepAPI使用RNN模型更好的学习到了句子的语义信息. 经测试BLEU (BiLingual Evaluation Understudy)[16]值比基于传统模型的SWIM提高了约173%.

2 基于Attention机制模型的设计和实现

 图 2 从Java代码提取API序列对流程图

2.1 使用GrouMiner提取需要的API序列

 图 3 代码注释示例图

 图 4 通过GrouMiner将源代码转换为程序依赖图

2.2 注意力机制的Encoder-Decoder模型 2.2.1 基于RNN的Encoder-Decoder模型的缺点

 ${h_t} = f\left( {{h_{t - 1}},{x_t}} \right)$ (1)
 $c = {h_{{T_x}}}$ (2)

 图 5 基于RNN的Encoder-Decoder模型

2.2.2 使用注意力机制的Encoder-Decoder模型

 图 6 基于注意力机制的Encoder-Decoder模型

 图 7 Scaled Dot-Product Attention结构

 $Attention\left( {Q,K,V} \right) = soft{\rm{max}}\left( {\frac{{Q{K^T}}}{{\sqrt {{d_k}} }}} \right)V$ (3)

3 实验分析

3.1 BLEU测试结果

BLEU[16]是一种用于在机器翻译领域评估从一种自然语言翻译到另一种自然语言的文本质量的算法. 我们通过BLEU计算我们输出的API调用序列与标准参考的API序列相似度得分可以评判结果的好坏. 如图8所示, 我们分别取查询结果的第一个, 前五个以及前十个进行对比得出, 在返回一个查询结果的时候我们的方法高出DeepAPI约28.5%.

 图 8 模型的BLEU得分图

4 结论与展望

 [1] Fowkes J, Sutton C. Parameter-free probabilistic API mining across GitHub. Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Seattle, WA, USA. 2016. 254–265. [2] Ko AJ, Myers BA, Aung HH. Six learning barriers in end-user programming systems. Proceedings of 2004 IEEE Symposium on Visual Languages-Human Centric Computing. Rome, Italy. 2004. 199–206. [3] Robillard MP. What makes APIs hard to learn? Answers from developers. IEEE Software, 2009, 26(6): 27-34. DOI:10.1109/MS.2009.193 [4] Raghothaman M, Wei Y, Hamadi Y. SWIM: Synthesizing what I mean: Code search and idiomatic snippet synthesis. Proceedings of the IEEE/ACM 38th International Conference on Software Engineering. Austin, TX, USA. 2016. 357–367. [5] 聂黎明, 江贺, 高国军, 等. 代码搜索与API推荐文献分析. 计算机科学, 2017, 44(6A): 475-482. DOI:10.11896/j.issn.1002-137X.2017.6A.106 [6] Zhong H, Xie T, Zhang L, et al. MAPO: Mining and recommending API usage patterns. Proceedings of the 23rd European Conference on ECOOP 2009. Berlin, Heidelberg. 2009. 318–343. [7] Bajracharya S, Ngo T, Linstead E, et al. Sourcerer: A search engine for open source code supporting structure-based search. Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications. Portland, OR, USA. 2006. 681–682. [8] McMillan C, Grechanik M, Poshyvanyk D, et al. Portfolio: Finding relevant functions and their usage. Proceedings of the 33rd International Conference on Software Engineering. Honolulu, HI, USA. 2011. 111–120. [9] Bajracharya SK, Ossher J, Lopes CV. Leveraging usage similarity for effective retrieval of examples in code repositories. Proceedings of the 8th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Santa Fe, NM, USA. 2010. 157–166. [10] Keivanloo I, Rilling J, Zou Y. Spotting working code examples. Proceedings of the 36th International Conference on Software Engineering. Hyderabad, India. 2014. 664–675. [11] Jiang H, Nie LM, Sun ZY, et al. ROSF: Leveraging information retrieval and supervised learning for recommending code snippets. IEEE Transactions on Services Computing, 2019, 12(1): 34-46. DOI:10.1109/TSC.2016.2592909 [12] Rahman MM, Roy CK, Lo D. Rack: Automatic API recommendation using crowdsourced knowledge. Proceedings of IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). Suita, Japan. 2016. 349–359. [13] Thung F, Wang SW, Lo D, et al. Automatic recommendation of API methods from feature requests. Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. Silicon Valley, CA, USA. 2013. 290–300. [14] Niu HR, Keivanloo I, Zou Y. Learning to rank code examples for code search engines. Empirical Software Engineering, 2017, 22(1): 259-291. DOI:10.1007/s10664-015-9421-5 [15] Gu XD, Zhang HY, Zhang DM, et al. Deep API learning. Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Seattle, WA, USA. 2016. 631–642. [16] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia, PA, USA. 2002. 311–318. [17] Nguyen TT, Nguyen HA, Pham NH, et al. Graph-based mining of multiple object usage patterns. Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering. Amsterdam, The Netherlands. 2009. 383–392. [18] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems 30. 2017. 5998–6008.