Abstract:The Hadoop system is widely used as a distributed architecture for big data storage. It generates a large amount of log data during runtime to record device anomalies, which provides important clues for locating and analyzing problems. However, traditional log anomaly detection models typically collect log data on a central server, which introduces the risk of sensitive information leakage during data collection. Federated learning, a novel machine learning paradigm, effectively protects data privacy by training models on local servers and aggregating model parameters only on a central server. This study proposes a log anomaly detection architecture based on federated learning, which combines local and central servers to perform detection tasks, avoiding the risk of leaking sensitive information during network transmission. Additionally, it employs a tree parser to standardize log templates. To effectively capture complex patterns and anomalous behaviors in log data, a BiLSTM model based on the self-attention mechanism is established as a local server model. To validate the effectiveness of the proposed method, simulation experiments are conducted using publicly available datasets of distributed systems. The results demonstrate that the model maintains stable comprehensive evaluation metrics, with an accuracy rate above 93%, indicating high applicability.