Abstract:The question answering (Q&A) system is one of the promising research directions in the field of artificial intelligence and natural language processing. Early Q&A systems can only ask and answer in the form of natural language. In recent years, with the development of multimodal knowledge graphs and multimodal pre-training models, generalized Q&A systems supporting information queries of multiple modes such as text, image, audio, and video have gradually become a new research hotspot, and their display of results in a multimedia manner is more intuitive and comprehensive. This study classifies Q&A systems into three types according to their changing task objects: dedicated Q&A systems, general Q&A systems, and multimodal Q&A systems. The problems faced in the development of these three types of Q&A systems are analyzed, and the key technologies and methods used in each stage are highlighted and summarized. In addition, the industrial applications of Q&A systems are exemplified, and future research directions are prospected.