Abstract:An affective speech generation technology based on a conditional generative adversarial network (GAN) is proposed in this study. After the introduction of affective conditions and the learning of affective information from the phonetic database, a brand new affective speech with specified emotions can be generated independently. GAN is composed of a discrimination network and a generator. With TensorFlow as the learning framework, the conditional GAN model is employed to train plenty of affective speech, and the speech generation network G and generation network D are used to form a dynamic “game process” for better learning and observation of the conditional distribution of speech emotion data. The generated sample is close to the natural speech signal of the original learning content, which has diversity and can approximate the speech data consistent with the real emotion. The proposed solution is evaluated on the interactive emotional dyadic motion capture (IEMOCAP) corpus and the self-built emotional corpus. It generates more accurate results than the existing affective speech generation algorithms.