Non-normal Data Synthesis Based on Mixed Data Type Correlation Measurement
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Data plays an extremely important role in research and development in fields such as machine learning and artificial intelligence. However, some real-world factors prevent data consumers from obtaining real datasets that meet their work requirements, such as privacy issues, data scarcity, and poor data quality. In response to this situation, this study develops a non-normal data synthesis algorithm (KMSI) as an improvement to the sampling-iteration (SI) technique. This algorithm utilizes a mixed-type correlation coefficient matrix to reduce measurement errors in various steps of the SI technique, including target setting and control loops. It replaces Bootstrap sampling with kernel density estimation sampling to avoid using real data. Experimental results show that, compared to the SI technique, KMSI is capable of handling complex and mixed-type datasets and does not include real data in the synthetic results. Furthermore, compared to other enhancement methods, KMSI offers users more customization options for the sample size in synthetic datasets.

    Reference
    Related
    Cited by
Get Citation

王春东,张世鹏.基于混合数据类型相关性度量的非正态数据合成.计算机系统应用,2024,33(3):195-205

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 23,2023
  • Revised:October 20,2023
  • Online: January 19,2024
Article QR Code
You are the first992118Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063