本文已被:浏览 1481次 下载 3223次
Received:April 10, 2014 Revised:May 09, 2014
Received:April 10, 2014 Revised:May 09, 2014
中文摘要: 在对社交网络采样方法进行研究时, 常以拒绝-接受采样法得到的样本作为对照来评价其他采样方法的优劣. 由于各种在线社交网络陆续将其用户ID系统由32位升级为64位, 导致拒绝-接受采样法的采样命中率近乎为零. 本文根据在线社交网络的特点, 以新浪微博为例, 对其用户ID分布情况进行分析, 提出了一种改进的拒绝-接受采样法UNI64. 该方法通过分析网络有效ID样本的分布情况, 结合聚类的方法将整个样本空间划分为有效区间和无效区间, 并使采样算法避开无效区间, 仅在有效区间内生成待测样本, 从而有效提高了拒绝-接受采样法在有效样本极为稀疏的样本空间内采样的命中率.
Abstract:When studying the sampling methods on online social networks, samples collected by acceptance-rejection method are usually used as the "ground truth" to estimate the pros and cons of other sampling methods. The acceptance rate of the original acceptance-rejection method slumps dramatically since OSN sites updated their user ID from 32bit to 64bit. According to the characteristics of online social networks and taking Sina Weibo for example, we analyzed the distribution of user IDs in Sina Weibo, and proposed an improved acceptance-rejection method called UNI64. In this method, the user ID space is divided into valid intervals and vacant intervals by analyzing the distribution of valid sample IDs and utilizing cluster method. The sampling method generates candidate IDs only in valid intervals, so that the acceptance rate could be effectively improved even in a sparse-distributed user ID space.
文章编号: 中图分类号: 文献标志码:
基金项目:北京高等学校青年英才计划(YETP0506)
引用文本:
许南山,李浩,卢罡.在线社交网络的UNI64采样方法.计算机系统应用,2014,23(12):206-212
XU Nan-Shan,LI Hao,LU Gang.UNI64 Sampling Method on Online Social Networks.COMPUTER SYSTEMS APPLICATIONS,2014,23(12):206-212
许南山,李浩,卢罡.在线社交网络的UNI64采样方法.计算机系统应用,2014,23(12):206-212
XU Nan-Shan,LI Hao,LU Gang.UNI64 Sampling Method on Online Social Networks.COMPUTER SYSTEMS APPLICATIONS,2014,23(12):206-212