Abstract:Numerous real-world tasks require the collaboration of multiple agents, often with limited communication and incomplete observations. Deep multi-agent reinforcement learning (Deep-MARL) algorithms show remarkable effectiveness in tackling such challenging scenarios. Among these algorithms, QTRAN and QTRAN++ are representative approaches capable of learning a broad class of joint-action value functions with strong theoretical guarantees. However, the performance of QTRAN and QTRAN++ is hindered by their reliance on a single joint action-value estimator and their neglect of preprocessing agent observations. This study introduces a novel algorithm called OPTQTRAN, which significantly improves upon the performance of QTRAN and QTRAN++. Firstly, the study proposes a dual joint action-value estimator structure that leverages a decomposition network module to compute additional joint action-values. To ensure accurate computation of joint action-value estimators, it designs an adaptive network that facilitates efficient value function learning. Additionally, it introduces a multi-unit network that groups agent observations into different units for effective estimation of utility functions. Extensive experiments conducted on the widely-used StarCraft benchmark across diverse scenarios demonstrate that the proposed approach outperforms state-of-the-art MARL methods.