###
计算机系统应用英文版:2019,28(10):239-244
本文二维码信息
码上扫一扫!
系统日志模板提取方法研究
(1.中国科学院 计算机网络信息中心, 北京 100190;2.中国科学院大学, 北京 100190;3.福建省龙岩烟草工业有限责任公司, 龙岩 364021)
Research on Extraction Method of System Log Template
(1.Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;2.University of Chinese Academy of Sciences, Beijing 100190, China;3.Fujian Longyan Tobacco Industrial Co. Ltd., Longyan 364021, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1594次   下载 2595
Received:March 22, 2019    Revised:April 17, 2019
中文摘要: 提取日志模板是处理海量系统日志十分有效的方法.本文以Web系统日志为切入点,采用基于标签识别树的模板提取方法提取日志模板,并在其基础上,研究并完善了其日志预处理和模板表达式生成方法.针对于系统日志普遍存在的结构复杂问题,具体采用了基于文本相似度的预处理方法,实现了日志消息分类;采用模板最大匹配的方法,解决了由于日志格式不统一和切词导致的模板匹配度低的问题.最后,对本次日志模板提取方法的实验进行了评估,结果证明该方法的准确率达到96.4%,且模板匹配度大幅上升.
Abstract:Extracting log template is a very effective way to handle massive system logs. In this study, the Web system log is used as the entry point, extracts the log template by using signature tree model. Based on it, we studied and improved the log preprocessing and template expression generation methods. Aiming at the complex structure problem of syslog, the preprocessing method based on text similarity is adopted to realize the classification of log messages. We used the max template matching method to solve the low template matching problem caused by the inconsistent log format and word-cutting. Finally, we evaluate the experiment of this log template extraction method. The results show that the accuracy of the method is 96.4%, and the template matching degree is greatly increased.
文章编号:     中图分类号:    文献标志码:
基金项目:新一代ARP试点项目(XXH13502-01)
引用文本:
刘洪歧,陈远平,马建化.系统日志模板提取方法研究.计算机系统应用,2019,28(10):239-244
LIU Hong-Qi,CHEN Yuan-Ping,MA Jian-Hua.Research on Extraction Method of System Log Template.COMPUTER SYSTEMS APPLICATIONS,2019,28(10):239-244