Abstract:The key for artificial intelligence to fundamentally comprehend the world around us is to identify and disentangle hidden, potentially interpretable factors from observed low-level sensory data. Disentangled representation learning aims to extract these independent and interpretable latent variables from data, while causally disentangled representation learning further emphasizes the causal relationships among these latent variables, thereby more truly simulating the complexity of the real world. In light of the increasing importance of causal learning, this study provides a detailed and comprehensive introduction to relevant methods combining causal learning with disentangled representation learning, intending to support future development in disentangled representation learning. The study classifies causally disentangled representation learning based on commonly used causal learning methods, mainly discussing methods that integrate structural causal models with flow-based disentangled representation learning, as well as commonly used datasets and evaluation metrics. Furthermore, it analyzes practical applications of causally disentangled representation learning in image generation, 3D pose estimation, and unsupervised domain adaptation, and provides a forward-looking perspective on future research directions. This study reveals potential exploration paths for researchers and practitioners, promoting continuous development and innovation in this field.