基于多维特征融合的文献研究领域关联程度量化方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金 (L2324126)


Quantitative Method for Assessing Relatedness of Literature Research Domains Based on Multi-dimensional Feature Fusion
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    传统文献特征提取方法通常依赖于单一维度的领域特征, 难以准确预测细化的文献研究领域关联程度. 细化的关联程度预测要求提取极高精度的领域关联特征, 但在多维度提取过程中很容易出现过平滑问题, 进而导致错误的领域关联程度预测, 使得量化精度较低. 为解决上述问题, 本文提出了一种基于多维特征融合的文献研究领域关联程度量化方法. 首先, 在传统Doc2Vec模型提取文献语义内容特征的基础上, 构建多个关联维度图并赋予相应权重, 以提高结构关联特征的全面性. 其次, 在图学习模块中引入多通道传播策略和自适应聚合机制, 通过优化节点关联特征的聚合方式, 缓解了传统GCN的过平滑问题, 从而实现不同文献间精确的研究领域关联. 最后, 通过构建覆盖学者多维关联特征向量空间的最小n维球模型, 定量评估跨领域学者科研能力. 在大规模真实文献数据集上的实验结果表明, 该方法的带误差容限准确率(tolerance-aware accuracy, TAA)达到0.68, 比Doc2Vec、GCN和Sentence-BERT模型分别高出0.67、0.08和0.02, 且在不同的图神经网络模型中性能波动较小, 证明了所提方法在精度和稳定性方面均优于近年主流的基线模型.

    Abstract:

    Traditional literature feature extraction methods typically rely on single-dimensional domain features, making it difficult to accurately predict the relatedness of fine-grained literature research domains. The multi-dimensional extraction process often faces the over-smoothing problem, leading to inaccurate predictions of relatedness and lower quantization accuracy. To address these issues, a method is proposed to quantify the relatedness of literature research based on multi-dimensional feature fusion. First, based on the traditional Doc2Vec model for extracting semantic content features from literature, multiple related dimension graphs are constructed and assigned corresponding weights to enhance the comprehensiveness of structural related features. Second, a multi-channel propagation strategy and adaptive aggregation mechanism are incorporated into the graph learning module, which mitigates the over-smoothing problem in traditional GCN by optimizing the aggregation of related node features, thus enabling precise domain-relatedness prediction among different literature. Finally, a minimum n-dimensional sphere model is constructed to cover the multi-dimensional related feature vector space of scholars, enabling the quantitative evaluation of cross-domain scientific research abilities. Experimental results on a large-scale real literature dataset show that the tolerance-aware accuracy (TAA) of the proposed method reaches 0.68, outperforming Doc2Vec, GCN, and Sentence-BERT models by 0.67, 0.08, and 0.02, respectively. Moreover, the performance fluctuation across different graph neural network models is minimal, demonstrating that the proposed method outperforms mainstream baseline models in terms of both accuracy and stability.

    参考文献
    相似文献
    引证文献
引用本文

韩进,王志,石进.基于多维特征融合的文献研究领域关联程度量化方法.计算机系统应用,,():1-12

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-11-10
  • 最后修改日期:2025-01-21
  • 录用日期:
  • 在线发布日期: 2025-04-30
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号