Abstract:Existing textual detoxification methods do not fully consider implicit toxicity, and detoxified text often has low quality. To address these problems, a multi-stage multi-objective detoxification framework, termed as MSMO-Detox, is proposed. MSMO-Detox uses a three-stage cascade for precise detoxification. First, a marker-based toxicity attribution technique propagates decomposition vectors to identify tokens whose toxicity contribution exceeds a threshold and performs masking on these tokens. Second, a product of experts (PoE) framework generates replacement tokens for masked positions. Third, a multi-objective reranking strategy conducts a comprehensive evaluation of candidate sentences across implicit toxicity, fluency, and semantic preservation, and selects the highest-scoring candidate as the output. Experimental results show that on MAgr, SBF, DynaHate, and Jigsaw datasets, MSMO-Detox reduces toxicity metrics by an average of 23.1%, 23.9%, 17.6%, and 5.6%, compared with the best baseline on each dataset. Fluency and semantic preservation also improve. MSMO-Detox demonstrates clear advantages in textual detoxification and can be applied to the task of toxic-text style transfer as an important tool for the elimination of cyber violence and optimizing online ecosystems.