新型的面向新闻评论摘要采集算法
作者:

Novel News Article Comments Summarization Algorithm of Computer Engineering and Applications
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [10]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    为了让读者可以更快地获取所有新闻评论中最有代表性的观点,提出一种新的新闻评论摘要采集算法,并依此设计出评论摘要采集系统.该算法将有效地结合聚类算法和排序算法,首先,使用改进的Borderflow算法对所有评论聚类;其次,采用类PageRank算法对聚类中的评论进行排序,选出排名最前的几条评论;最后,利用MMR算法对PageRank算法选出的所有评论进行再次排序,并选取名次最高的K条评论作为评论摘要.通过仿真实验得到的NDCG和MAP数据表明,使用本文算法得到的评论摘要具有更好的有效性和准确性,更符合读者直观感觉.

    Abstract:

    In order to make the readers get the most informative and representative opinions efficiently among the news comments, this paper proposes a novel news article comments summarization algorithm and then designs an article summarization system, which combines the clustering algorithm with the ranking algorithm.First, it groups comments using the modified BorderFlow clustering algorithm.Second, for each group, it uses the similar PageRank algorithm to score and rank comments, and selects top comments in each cluster as representation.At last, it ranks the selected comments by MMR algorithm and displays the top-K comments as the comments summarization.According to the experimental statics of NDCG and MAP data, the proposed method meets the intuitive sense of readers more.Meanwhile, it shows the better effectiveness and accuracy theoretically.

    参考文献
    1 Hu M, Sun A, Lim EP. Comments-oriented document summarization:Understanding documents with readers' feedback. Proc. of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM. 2008. 291-298.
    2 Ma Z, Sun A, Yuan Q, et al. Topic-driven reader comments summarization. Proc. of the 21st ACM International Conference on Information and Knowledge Management. ACM. 2012. 265-274.
    3 Khabiri E, Caverlee J, Hsu CF. Summarizing user-contributed comments. Fifth International AAAI Conference on Weblogs and Social Media. 2011.
    4 Ngomo ACN, Schumacher F. BorderFlow:A local graph clustering algorithm for natural language processing. Lecture Notes in Computer Science, 1970, 5449:547-558.
    5 Page L. The PageRank citation ranking:Bringing order to the web. Stanford InfoLab. 1999:1-14.
    6 Goldstein J, Carbonell J. Summarization:(1) using MMR for diversity-based reranking and (2) evaluating summaries. Proc. of a Workshop on Held at Baltimore. Association for Computational Linguistics. Maryland. 1998. 181-195.
    7 Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management an International Journal, 1988, 24(5):513-523.
    8 荣秋生,颜君彪,郭国强.基于DBSCAN聚类算法的研究与实现.计算机应用,2004,24(4):45-46.
    9 Hartigan JA, Wong MA. Algorithm AS 136:A k-means clustering algorithm. Applied Statistics, 1979, 28(1):100-108.
    10 Johnson SC. Hierarchial clustering schemes. Psychometrika, 1967, 32(3):241-248.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

师昕,赵雪青.新型的面向新闻评论摘要采集算法.计算机系统应用,2017,26(1):163-167

复制
分享
文章指标
  • 点击次数:1294
  • 下载次数: 2069
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2016-04-12
  • 最后修改日期:2016-05-19
  • 在线发布日期: 2017-01-14
文章二维码
您是第11227613位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号