Abstract:The simple contrastive learning of sentence embedding (SimCSE) framework only uses the classification [CLS]tokens as text vectors, and it also neglects the hierarchical information within the base model, which results in insufficient extraction of semantic features from the base model output. Based on the SimCSE framework, this study proposes a method that fuses hierarchical features of pre-trained models, SimCSE with hierarchical feature fusion (SimCSE-HFF). SimCSE-HFF is based on a dual-path parallel network, using short and long paths to strengthen feature learning. The short path uses a convolutional neural network to learn local text features and perform dimensionality reduction, while the long path uses a bidirectional gated recurrent neural network to learn deep semantic information. Additionally, in the long path, an autoencoder is used to fuse features from other layers within the base model, solving the problem of insufficient extraction of output features by the model. On the Chinese and English datasets of spring tools suite-bundle (STS-B), the SimCSE-HFF method outperforms traditional methods in terms of semantic similarity Spearman and Pearson correlation metrics, showing improvements on different pre-trained models. Additionally, it also outperforms the SimCSE framework in downstream task retrieval-based question answering, demonstrating better versatility.