Abstract:Optimizing the control strategy of traffic signals can improve the efficiency of vehicular traffic on roads and alleviate congestion. To overcome the challenge of efficiently optimizing signal control strategies at single intersections using value function-based deep reinforcement learning algorithms, this study develops a method based on sample optimization called modified proximal policy optimization (MPPO). This approach enhances the quality of model sample selection by maximizing the extraction from the agent target function in the traditional PPO algorithm. It employs a multi-dimensional traffic state vector as input for the model’s observations, enabling it to promptly track and utilize the dynamic changes in road traffic conditions. The accuracy and effectiveness of the MPPO algorithm model are verified by comparing it with value function reinforcement learning control methods using the urban traffic micro simulation software (SUMO). Simulation experiments show that this approach closely resembles real traffic scenarios compared to value function reinforcement learning control methods. It significantly accelerates the convergence speed of cumulative vehicle waiting time, noticeably reduces the average vehicle queue length and waiting time, and effectively improves the traffic throughput at the intersection.