WEB文献资料采集系统

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月15日 5:52 星期二

首页 > 过刊浏览>2012年第21卷第7期 >9-12,37

PDF HTML阅读 XML下载导出引用引用提醒

WEB文献资料采集系统
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        马创新马创新
南京师范大学 文学院,南京 210097
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家社科基金重大项目(10&ZD117);江苏高校重点研究基地重大项目(2010JDXM023);江苏省教育厅高校哲学社会科学基金(2011SJB7400 10); 江苏省高校自然科学研究项目(11KJD520009)

Web Literature Collection System

Author:

MA Chuang-Xin
MA Chuang-Xin
College of Liberal Arts, Nanjing Normal University, Nanjing 210097,China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为了能够充分利用WEB上丰富的文献资源,设计了一个专业的WEB文献资料采集系统WLES。该系统集成了网页抓取和网页清洗两方面技术,并且引入机器学习方法到网页清洗中,通过机器对训练语料的学习得到一个清洗模型,然后用该模型来实施网页清洗。实验证明该系统在网页抓取和网页清洗方面都具有优良的性能,能够满足使用者的文献采集需求。

关键词:文献资料采集;机器学习;网页清洗;清洗模型

Abstract:

In order to take advantage of the rich literature resources on the WEB, this paper designed a professional web literature collection system WLES. The WLES integrates Web crawling and Web cleaning technology. The machine learning method is introduced to the study of Web cleaning. Machine learning on the training data can get a clean model, and then use the model to implement web cleaning. Experiments show: WLES in web crawling and web page cleaning has an excellent performance, to meet the needs of the user's literature collection.

Key words:literature collection;machine learning;pages clean;cleaning model

引用本文

马创新. WEB文献资料采集系统.计算机系统应用,2012,21(7):9-12,37

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2011-11-03
最后修改日期:2011-12-01
录用日期:
在线发布日期:
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码