Abstract:In recent years, in the job shop dynamic scheduling system based on Q-learning algorithm, the state action and reward value are set subjectively by human beings, which leads to the unsatisfactory learning effect. Compared with the known optimal solution, the result deviation is larger. For this reason, based on the characteristics of job shop scheduling problem, the elements of Q-learning algorithm are redesigned, and simulation test is carried out with standard case library. The results are compared with the known optimal solution, the hybrid Gray Wolf algorithm, the discrete cuckoo algorithm and the quantum whale swarm algorithm in terms of approximation and minimum. The experimental results show that compared with the Q-learning algorithm for solving the job shop scheduling problem in China, this method is significantly improved in the approximate degree of the optimal solution, and compared with the group intelligence algorithm, in most cases, the optimization ability is significantly improved.