Abstract:As the scale of systems continues to expand, the system structure also becomes very complex. The rule-based methods have been difficult to judge the composite faults under the interaction of multiple systems, and it is also hard to predict potential faults. Firstly, the study uses the ELK platform for centralized management of logs in complex scenarios of multi-business systems. Then, it sorts out the relationship between logs and various business systems, hosts, and processes in a complex system environment. Finally, we filter out the log files related to the failure in the system, and use these data in the deep learning framework TensorFlow to train the LSTM algorithm model, so as to realize the real-time fault prediction of the system.