Abstract:In order to reflect the automatic analysis and understanding of the auditory scene content by the auditory attention neural information processing computational mechanism, this paper presents a top-down extraction model of the auditory saliency attention, based on the perceptual characteristics of human ear to frequency transformation, and combined with the speaker identification using the depth belief network and the auditory significant model. The simulation results show that the proposed model is feasible, and it can effectively highlight the significant degree of the target speaker in the speaker identification technology using the depth belief network.