Table Data Simulation Generating Algorithm Based on Not-Temporal Attribute

doi:10.15888/j.cnki.csa.006195

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-3- 9

Home > Archive>Volume 27, Issue 2, 2018 >30-36. DOI:10.15888/j.cnki.csa.006195

PDF HTML XML Export Cite reminder

Table Data Simulation Generating Algorithm Based on Not-Temporal Attribute
DOI:
                        10.15888/j.cnki.csa.006195
                    
CSTR:
                        [cstr]
                    
Author:
                        ZHANG RuiZHANG Rui
College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
XIAO Ru-LiangXIAO Ru-Liang
College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
NI You-CongNI You-Cong
College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
DU XinDU Xin
College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
CAI Sheng-ZhenCAI Sheng-Zhen
College of Mathematics and Informatics, Fujian Normal University, Fuzhou 350117, China;Fujian Provincial Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [19]

Related [20]

Cited by

Materials

Comments

Abstract:

A table data simulation generating algorithm is proposed based on not-temporal attribute correlation. This algorithm can overcome the difficulty in building not-temporal attribute correlation in the development of big data simulation generator, and play an important role in the field of measurement of the big data simulation generated. Firstly, we extract the two key not-temporal attributes from the data set, and make the statistics of twofold frequency. Then, based on the statistical results, we calculate the maximal information coefficient (MIC) value to measure dependence for two-variable relationships. We use the stretched exponential (SE) distribution to fit the relationship, and build the correlation model. Finally, we generate data in a two-dimensional matrix with this model. The experimental results show that this algorithm can effectively describe the data characteristics of the real data set.

Key words:data simulation generator;correlation;maximal information coefficient (MIC);stretched exponential distribution;attribute correlation

Reference

[1] Guo L, Tan EH, Chen SQ, et al. The stretched exponential distribution of internet media access patterns. Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing. Toronto, Canada. 2008. 283-294.

[2] 韩筱璞, 汪秉宏, 周涛. 人类行为动力学研究. 复杂系统与复杂性科学, 2010, 7(2): 132-144.

[3] Guo L, Tan EH, Chen SQ, et al. Analyzing patterns of user content generation in online social networks. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France. 2009. 369-378.

[4] Busari M, Williamson C. ProWGen: A synthetic workload generation tool for simulation evaluation of Web proxy caches. Computer Networks, 2002, 38(6): 779-794. [DOI:10.1016/S1389-1286(01)00285-7]

[5] Ming ZJ, Luo CJ, Gao WL, et al. BDGS: A scalable big data generator suite in big data benchmarking. In: Rabl T, Raghunath N, Poess M, et al, eds. Advancing Big Data Benchmarks. Cham, Swizerland: Springer, 2014. 138-154.

[6] Rabl T, Frank M, Sergieh HM, et al. A data generator for cloud-scale benchmarking. Proceedings of the Second TPC Technology Conference on Performance Evaluation, Measurement and Characterization of Complex Systems. Berlin, Heidelberg, Germany. 2010. 41-56.

[7] 詹剑锋, 高婉铃, 王磊, 等. BigDataBench: 开源的大数据系统评测基准. 计算机学报, 2016, 39(1): 196-211. [DOI:10.11897/SP.J.1016.2016.00196]

[8] Gray J, Sundaresan P, Englert S, et al. Quickly generating billion-record synthetic databases. Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. Minneapolis, MN, USA. 1994. 243-252.

[9] Cooper BF, Silberstein A, Tam E, et al. Benchmarking cloud serving systems with YCSB. Proceedings of the 1st ACM Symposium on Cloud Computing. Indianapolis, IN, USA. 2010. 143-154.

[10] Abramova V, Bernardino J, Furtado P. Evaluating Cassandra scalability with YCSB. International Conference on Database and Expert Systems Applications. Springer International Publishing 2014. 199-207.

[11] Yin JW, Lu XJ, Zhao XK, et al. BURSE: A bursty and self-similar workload generator for cloud computing. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(3): 668-680. [DOI:10.1109/TPDS.2014.2315204]

[12] Akrour N, Mallet C, Barthes L, et al. A rainfall simulator based on multifractal generator. EGU General Assembly Conference Abstracts. Vienna, Austria. 2015.

[13] Ansari N, Liu H, Shi YQ, et al. On modeling MPEG video traffics. IEEE Transactions on Broadcasting, 2002, 48(4): 337-347. [DOI:10.1109/TBC.2002.806794]

[14] Jiang M, Nikolic M, Hardy S, et al. Impact of self-similarity on wireless data network performance. Proceedings of IEEE International Conference on Communications. Helsinki, Finland. 2001. 477-481.

[15] Speed T. A correlation for the 21st century. Science, 2011, 334(6062): 1502-1503. [DOI:10.1126/science.1215894]

[16] Fan JQ, Han F, Liu H. Challenges of big data analysis. National Science Review, 2014, 1(2): 293-314. [DOI:10.1093/nsr/nwt032]

[17] Rabl T, Lang A, Hackl T, et al. Generating shifting workloads to benchmark adaptability in relational database systems. In: Nambiar R, Poess M, eds. Performance Evaluation and Benchmarking. Berlin Heidelberg, Germany: Springer, 2009. 116-131.

[18] 钱宇华, 成红红, 梁新彦, 等. 大数据关联关系度量研究综述. 数据采集与处理, 2015, 30(6): 1147-1159.

[19] Reshef DN, Reshef YA, Finucane HK, et al. Detecting novel associations in large data sets. Science, 2011, 334(6062): 1518-1524. [DOI:10.1126/science.1205438]

Get Citation

张锐,肖如良,倪友聪,杜欣,蔡声镇.基于非时间属性关联的数据逼真生成算法.计算机系统应用,2018,27(2):30-36

Copy

Article Metrics

Abstract:1631
PDF: 2014
HTML: 1134
Cited by: 0

History

Received:May 02,2017
Revised:May 19,2017
Adopted:
Online: February 05,2018
Published:

Article QR Code

You are the first990372Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063