Abstract:The core of penetration testing is to discover penetration paths, but not all penetration paths can be successful. Therefore, the optimal penetration path needs to be chosen based on the current system environment. In this context, firstly, this study models the environment as a Markov decision process (MDP) graph based on the attack graph and uses a value iteration algorithm to find the optimal penetration path. Secondly, a new replanning algorithm is proposed to deal with the failure of penetration actions in the MDP graph and find the optimal penetration path again. Finally, in view of the existence of multiple attack targets in the penetration testing process, this study proposes a multi-objective global optimal penetration path algorithm for MDP graphs. Experimentally, the proposed algorithm shows higher efficiency and stability in replanning tasks and is effective in multi-objective tasks, which can prevent unnecessary penetration actions from being executed.