Abstract:Along with the rapid development of internet services, the distributed microservice-based application has gradually replaced the traditional application as one of the main forms of Internet applications. Distributed microservice-based applications boast scalability, high fault tolerance, and great availability, but they are often challenged by cumbersome installation, complicated deployment, and difficult maintenance. Kubernetes, as the most popular container-based cluster management system, is affected by coarse grains, inaccurate fault location, and other weaknesses. To address the above issues, this study proposes a fault detection method based on trace similarity matching: First, use injecting proxy to forward request traffic to collect tracking information about microservices. Then, collect the state information during normal operation of the system and record the performance of the system after the failure occurs by injecting known faults. Finally, take string edit distance as the standard for the execution tracking models of unknown and known faults. The edit distance serves as a standard to measure the similarity, and the possible cause of failure is identified. Experimental results show that the method can accurately describe the processing and execution tracking information of the request and find the cause of system failure with microservices as the granularity.