Text Matching Based on SimCSE Framework Fused with Pre-trained Model Internal Hierarchical Features
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The simple contrastive learning of sentence embedding (SimCSE) framework only uses the classification [CLS]tokens as text vectors, and it also neglects the hierarchical information within the base model, which results in insufficient extraction of semantic features from the base model output. Based on the SimCSE framework, this study proposes a method that fuses hierarchical features of pre-trained models, SimCSE with hierarchical feature fusion (SimCSE-HFF). SimCSE-HFF is based on a dual-path parallel network, using short and long paths to strengthen feature learning. The short path uses a convolutional neural network to learn local text features and perform dimensionality reduction, while the long path uses a bidirectional gated recurrent neural network to learn deep semantic information. Additionally, in the long path, an autoencoder is used to fuse features from other layers within the base model, solving the problem of insufficient extraction of output features by the model. On the Chinese and English datasets of spring tools suite-bundle (STS-B), the SimCSE-HFF method outperforms traditional methods in terms of semantic similarity Spearman and Pearson correlation metrics, showing improvements on different pre-trained models. Additionally, it also outperforms the SimCSE framework in downstream task retrieval-based question answering, demonstrating better versatility.

    Reference
    Related
    Cited by
Get Citation

盛成城,陈进东,张健.基于SimCSE框架融合预训练模型层级特征的文本匹配.计算机系统应用,2024,33(7):103-111

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 10,2024
  • Revised:February 07,2024
  • Adopted:
  • Online: June 05,2024
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063