Virtual Sample Generation Method Based on Semantic Meaning Extraction of Vae’s Latent Variables
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [21]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    The application of artificial intelligent has been stimulating the productivity and technological revolution of industries. Traditional industries are facing small sample and imbalanced data problems due to the rarity nature of sample, cost and privacy issues. However, the sample generation results of existing methods are often limited to balancing generalization and validity. The purposed semantic meaning extraction of VAE’s latent variables based virtual sample generation method utilized the weights of encoder neural network as the measurement of dependency between input features and the latent variables. This method achieves flexible sample generation by controlling various dimensions of latent variables explicitly. The generated samples which satisfy the population distribution, are not necessarily included in the original samples. The results of sample expansion of civil buildings structural safety databases show that our method is capable of controllable generation of valid samples, and mitigating the problems of small sample and imbalanced data.

    Reference
    [1] 尹爱军, 王昱, 戴宗贤, 等. 基于变分自编码器的轴承健康状态评估. 振动、测试与诊断, 2020, 40(5): 1011–1016
    [2] 王劭菁, 马文嘉, 王丰华, 等. 基于虚拟样本生成技术与概率神经网络的接地网故障诊断. 高压电器, 2020, 56(6): 309–316
    [3] 路杨. 面向小样本不平衡数据的生物医学事件抽取方法研究[博士学位论文]. 长春: 吉林大学, 2019.
    [4] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. Proceedings of the 11th Annual Conference on Computational Learning Theory. New York: ACM, 1998. 92–100.
    [5] Goldman SA, Zhou Y. Enhancing supervised learning with unlabeled data. Proceeding of the 17th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2000. 327–334.
    [6] Gammerman A, Vovk V, Vapnik V. Learning by transduction. Proceeding of the Fourteenth Conference on Unvertainty in Artificial Intelligene. San Francisco: Morgan Kaufmann Publishers Inc., 1998. 148–155.
    [7] Abe N, Mamitsuka H. Query learning strategies using boosting and bagging. Proceeding of the 15th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 1998. 1–9.
    [8] 朱宝. 虚拟样本生成技术及建模应用研究[博士学位论文]. 北京: 北京化工大学, 2017.
    [9] 郑儒楠. 用于机器学习中图像识别的虚拟样本算法研究及应用[硕士学位论文]. 南京: 南京航空航天大学, 2017.
    [10] 于旭, 杨静, 谢志强. 虚拟样本生成技术研究. 计算机科学, 2011, 38(3): 16–19.
    [11] 程彬. 基于深度学习和样本扩充的场景文本检测研究[硕士学位论文]. 武汉: 华中师范大学, 2019.
    [12] 温津伟, 罗四维, 赵嘉莉, 等. 通过创建虚拟样本的小样本人脸识别统计学习方法. 计算机研究与发展, 2002, 39(7): 814–818
    [13] Bishop CM. Training with noise is equivalent to Tikhonov regularization. Neural Computation, 1995, 7(1): 108–116.
    [14] Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16: 321–357.
    [15] Domingos P. MetaCost: A general method for making classifiers cost-sensitive. Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 1999. 155–164.
    [16] 史倩月. 面向不平衡数据的分类算法[硕士学位论文]. 北京: 北京工业大学, 2019.
    [17] Sohn K, Yan XC, Lee H. Learning structured output representation using deep conditional generative models. Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015. 3483–3491.
    [18] Higgins I, Matthey L, Pal A, et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. 5th International Conference on Learning Representations. Toulon, France: OpenReview, 2017.
    [19] Kim H, Mnih A. Disentangling by factorising. International Conference on Machine Learning, PMLR, 2018: 2649–2658.
    [20] Chen RTQ, Li XC, Grosse R, et al. Isolating sources of disentanglement in variational autoencoders. Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2019: 2615–2625.
    [21] Kingma DP, Welling M. Auto-encoding variational bayes. Proceedings of the International Conference on Learning Representations (ICLR). 2014.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王俊杰,焦柯,彭子祥,谭丽红,王文波.基于变分自编码器潜变量语义提炼的样本生成方法.计算机系统应用,2022,31(3):255-261

Copy
Share
Article Metrics
  • Abstract:783
  • PDF: 1617
  • HTML: 1189
  • Cited by: 0
History
  • Received:April 28,2021
  • Revised:May 28,2021
  • Online: January 24,2022
Article QR Code
You are the first990387Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063