基于相似度匹配的微服务故障诊断方法
作者:
基金项目:

国家重点研发计划(2017YFB1400804); 国家自然科学基金(61872344); 北京市自然科学基金(4182070); 中国科学院青年创新促进会人才专项(2018144)


Fault Diagnosis Method Based on Trace Similarity Matching
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [26]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着互联网服务的快速发展, 分布式的微服务应用逐渐取代传统的单体应用成为互联网应用的主要形式之一. 微服务应用在具有可伸缩性、容错性、高可用性等优点的同时, 也存在着构建繁琐、部署复杂和维护困难等挑战. 面向云计算环境的微服务监测与运维是当前的研究热点, 但仍然存在粒度较粗、故障定位不准确等缺点. 针对以上问题, 本文提出了一种基于模式匹配的微服务故障诊断方法. 首先, 使用注入代理转发请求流量的方式收集并建模微服务的追踪信息; 然后, 收集系统正常运行下的状态信息, 并通过注入已知故障来收集并刻画故障发生后应用的运行状态; 最后, 将未知故障的执行追踪信息与已知故障的执行追踪信息相匹配, 采用字符串编辑距离衡量相似度以诊断可能的故障原因. 实验结果表明, 该方法可以有效刻画请求的处理执行追踪信息, 以微服务为粒度准确定位应用的故障原因.

    Abstract:

    Along with the rapid development of internet services, the distributed microservice-based application has gradually replaced the traditional application as one of the main forms of Internet applications. Distributed microservice-based applications boast scalability, high fault tolerance, and great availability, but they are often challenged by cumbersome installation, complicated deployment, and difficult maintenance. Kubernetes, as the most popular container-based cluster management system, is affected by coarse grains, inaccurate fault location, and other weaknesses. To address the above issues, this study proposes a fault detection method based on trace similarity matching: First, use injecting proxy to forward request traffic to collect tracking information about microservices. Then, collect the state information during normal operation of the system and record the performance of the system after the failure occurs by injecting known faults. Finally, take string edit distance as the standard for the execution tracking models of unknown and known faults. The edit distance serves as a standard to measure the similarity, and the possible cause of failure is identified. Experimental results show that the method can accurately describe the processing and execution tracking information of the request and find the cause of system failure with microservices as the granularity.

    参考文献
    [1] Chouliaras S, Sotiriadis S. Real-time anomaly detection of NoSQL systems based on resource usage monitoring. IEEE Transactions on Industrial Informatics, 2020, 16(9): 6042-6049. [doi: 10.1109/TII.2019.2958606
    [2] Sotiriadis S, Bessis N, Amza C, et al. Elastic load balancing for dynamic virtual machine reconfiguration based on vertical and horizontal scaling. IEEE Transactions on Services Computing, 2019, 12(2): 319-334. [doi: 10.1109/TSC.2016.2634024
    [3] Hochenbaum J, Vallis OS, Kejariwal A. Automatic anomaly detection in the cloud via statistical learning. https://arxiv.org/abs/1704.07706. [2017-04-24].
    [4] Yuan Y, Shi WC, Liang B, et al. An approach to cloud execution failure diagnosis based on exception logs in OpenStack. Proceedings of the IEEE 12th International Conference on Cloud Computing (CLOUD). Milan, Italy. 2019. 124-131.
    [5] Xu JM, Chen PF, Yang L, et al. LogDC: Problem diagnosis for declartively-deployed cloud applications with log. Proceedings of the IEEE 14th International Conference on E-Business Engineering (ICEBE). Shanghai, China. 2017. 282-287.
    [6] Jia T, Li Y, Zhang CB, et al. Machine deserves better logging: A log enhancement approach for automatic fault diagnosis. Proceedings of 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). Memphis, TN, USA. 2018. 106-111.
    [7] Chuah E, Jhumka A, Alt S, et al. Enabling dependability-driven resource use and message log-analysis for cluster system diagnosis. Proceedings of the IEEE 24th International Conference on High Performance Computing (HiPC). Jaipur, India. 2017. 317-327.
    [8] Zhang SL, Wang Y, Li WJ, et al. Service failure diagnosis in service function chain. Proceedings of the 19th Asia-Pacific Network Operations and Management Symposium (APNOMS). Seoul, Republic of Korea. 2017. 70-75.
    [9] Popa NM, Oprescu A. A data-centric approach to distributed tracing. Proceedings of 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). Sydney, Australia. 2019. 209-216.
    [10] Mace J, Roelke R, Fonseca R. Pivot tracing: Dynamic causal monitoring for distributed systems. Proceedings of the 25th Symposium on Operating Systems Principles. New York, NY, USA. 2018. 378-393.
    [11] Fasano G, Franceschini A. A multidimensional version of the Kolmogorov-Smirnov test. Monthly Notices of the Royal Astronomical Society, 1987, 225(1): 155-170. [doi: 10.1093/mnras/225.1.155
    [12] 盛骤, 谢式千, 潘承毅. 4版. 概率论与数理统计. 北京: 高等教育出版社, 2008. 382.
    [13] Sigelman BH, Barroso LA, Burrows M, et al. Dapper, a large-scale distributed systems tracing infrastructure. Google. 2010. http://research.google.com/archive/papers/dapper-2010-1.pdf
    [14] Kubernetes. Production-grade container orchestration. https://kubernetes.io/. [2019-12-27].
    [15] What is Istio? https://istio.io/latest/docs/concepts/what-is-istio/. [2020-01-10].
    [16] Bookinfo application. https://istio.io/latest/docs/examples/bookinfo/. [2020-01-11].
    [17] Larrucea X, Santamaria I, Colomo-Palacios R, et al. Microservices. IEEE Software, 2018, 35(3): 96-100. [doi: 10.1109/MS.2018.2141030
    [18] Istio. https://istio.io/. [2020-01-10].
    [19] Jaeger: Open source, end-to-end distributed tracing. https://www.jaegertracing.io/. [2020-04-06].
    [20] Meet the core products—All free and open. https://www.elastic.co/elastic-stack. [2020-04-06].
    [21] Márquez G, Lazo Y, Astudillo H. Evaluating frameworks assemblies in microservices-based systems using imperfect information. Proceedings of 2020 IEEE International Conference on Software Architecture Companion (ICSA-C). Salvador, Brazil. 2020. 250-257.
    [22] Sánchez JM, Ben Yahia IG, Crespi N. Self-modeling based diagnosis of services over programmable networks. Proceedings of 2016 IEEE NetSoft Conference and Workshops (NetSoft). Seoul, Republic of Korea. 2016. 277-285.
    [23] Ma SP, Fan CY, Chuang Y, et al. Using service dependency graph to analyze and test microservices. Proceedings of the IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC). Tokyo, Japan. 2018. 81-86.
    [24] Pina F, Correia J, Filipe R, et al. Nonintrusive monitoring of microservice-based systems. Proceedings of the IEEE 17th International Symposium on Network Computing and Applications (NCA). Cambridge, MA, USA. 2018. 1-8.
    [25] Zuul. https://github.com/Netflix/zuul. [2020-09-01].
    [26] Cinque M, Della Corte R, Pecchia A. Microservices monitoring with event logs and black box execution tracing. IEEE Transactions on Services Computing.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

陈皓,许源佳,王焘,张文博.基于相似度匹配的微服务故障诊断方法.计算机系统应用,2021,30(5):1-11

复制
分享
文章指标
  • 点击次数:1290
  • 下载次数: 2147
  • HTML阅读次数: 1536
  • 引用次数: 0
历史
  • 收稿日期:2020-08-31
  • 最后修改日期:2020-09-23
  • 在线发布日期: 2021-05-06
文章二维码
您是第11115155位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号