Adaptation Fine-tuning Based on Singular Value Decomposition
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [41]
  • |
  • Related [7]
  • | | |
  • Comments
    Abstract:

    The rise of large language models has profoundly impacted natural language processing. With the growth of computational resources and the expansion of model sizes, the potential applications of large language models in natural language processing are increasingly evident. However, the widely used low-rank adaptation (LoRA) method faces challenges related to fine-tuning efficiency and storage costs as model sizes increase. To address this issue, this study proposes a singular value decomposition-based adaptation fine-tuning method. This method only requires the diagonal matrix and scaling vector obtained from singular value decomposition to be trainable parameters, achieving performance improvement in multiple natural language processing tasks while reducing training costs. Experimental results show that the proposed method outperforms other methods of the same order of magnitude in GLUE and E2E benchmark tests. Compared with commonly used parameter-efficient fine-tuning methods, it demonstrates significant advantages in reducing the number of trainable parameters and improving fine-tuning efficiency, achieving the highest performance gains in experiments on the fine-tuning efficiency of trainable parameters. Future research will focus on optimizing the proposed method to achieve more efficient fine-tuning in a wider range of tasks and larger-scale models.

    Reference
    [1] Zhao WX, Zhou K, Li J, et al. A survey of large language models. arXiv:2303.18223, 2023.
    [2] Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. Large language models in medicine. Nature Medicine, 2023, 29(8): 1930–1940.
    [3] Brownlee J. How to avoid overfitting in deep learning neural networks. https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/. (2019-08-06).
    [4] Ding N, Qin YJ, Yang G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 2023, 5(3): 220–235.
    [5] Hu EJ, Shen YL, Wallis P, et al. LoRA: Low-rank adaptation of large language models. Proceedings of the 10th International Conference on Learning Representations. OpenReview.net, 2022.
    [6] Dettmers T, Pagnoni A, Holtzman A, et al. QLORA: Efficient finetuning of quantized LLMs. Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2023. 441.
    [7] Doering N, Gorlla C, Tuttle T, et al. Empirical analysis of efficient fine-tuning methods for large pre-trained language models. arXiv:2401.04051, 2024.
    [8] Lialin V, Deshpande V, Rumshisky A. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv:2303.15647, 2023.
    [9] Liu X, Zheng YN, Du ZX, et al. GPT understands, too. AI Open, 2023.
    [10] Liu X, Ji KX, Fu YC, et al. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin: ACL, 2022. 61–68.
    [11] Chavan A, Liu Z, Gupta D, et al. One-for-all: Generalized LoRA for parameter-efficient fine-tuning. arXiv:2306.07967, 2023.
    [12] Aghajanyan A, Gupta S, Zettlemoyer L. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. ACL, 2021. 7319–7328.
    [13] Sun XH, Ji YJ, Ma BC, et al. A comparative study between full-parameter and LoRA-based fine-tuning on Chinese instruction data for instruction following large language model. arXiv:2304.08109, 2023.
    [14] Zhang QR, Chen MS, Bukharin A, et al. Adaptive budget allocation for parameter-efficient fine-tuning. Proceedings of the 11th International Conference on Learning Representations. Kigali: OpenReview.net, 2023.
    [15] Zhang ZM, Ely G, Aeron S, et al. Novel methods for multilinear data completion and de-noising based on tensor-SVD. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014. 3842–3849.
    [16] Hastie T, Mazumder R, Lee JD, et al. Matrix completion and low-rank SVD via fast alternating least squares. The Journal of Machine Learning Research, 2015, 16(1): 3367–3402.
    [17] Feng X, Yu WJ, Li YH. Faster matrix completion using randomized SVD. Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence. Volos: IEEE, 2018. 608–615.
    [18] Zhang J, Lei Q, Dhillon IS. Stabilizing gradients for deep neural networks via efficient SVD parameterization. Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018. 5801–5809.
    [19] Yang HR, Tang MX, Wen W, et al. Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE, 2020. 2899–2908.
    [20] Chen YY, Tao QH, Tonin F, et al. Primal-attention: Self-attention through asymmetric kernel SVD in primal representation. Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2024. 2840.
    [21] Meng FX, Wang ZH, Zhang MH. PiSSA: Principal singular values and singular vectors adaptation of large language models. arXiv:2404.02948, 2024.
    [22] Kopiczko DJ, Blankevoort T, Asano YM. VeRA: Vector-based random matrix adaptation. Proceedings of the 12th International Conference on Learning Representations. Vienna: OpenReview.net, 2024.
    [23] Tao SY, Shen CY, Zhu L, et al. SVD-CNN: A convolutional neural network model with orthogonal constraints based on SVD for context-aware citation recommendation. Computational Intelligence and Neuroscience, 2020, 2020: 5343214.
    [24] Saxe AM, McClelland JL, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. Proceedings of the 2nd International Conference on Learning Representations. Banff, 2014.
    [25] Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. Proceedings of the 7th International Conference on Learning Representations. New Orleans: OpenReview.net, 2019.
    [26] Sun YF, Zheng L, Deng WJ, et al. SVDNet for pedestrian retrieval. Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017. 3820–3828.
    [27] Novikova J, Dušek O, Rieser V. The E2E dataset: New challenges for end-to-end generation. Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. Saarbrücken: ACL, 2017. 201–206.
    [28] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia: ACL, 2002. 311–318.
    [29] Doddington G. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. Proceedings of the 2nd International Conference on Human Language Technology Research. San Francisco: Morgan Kaufmann Publishers, 2002. 138–145.
    [30] Banerjee S, Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the 2005 ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor: ACL, 2005. 65–72.
    [31] Lin CY. ROUGE: A package for automatic evaluation of summaries. Proceedings of the 2004 Workshop on Text Summarization Branches Out. Barcelona: ACL, 2004. 74–81.
    [32] Vedantam R, Lawrence Zitnick C, Parikh D. CIDEr: Consensus-based image description evaluation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 4566–4575.
    [33] Wang A, Singh A, Michael J, et al. GLUE: A multi-task benchmark and analysis platform for natural language understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Brussels: ACL, 2018. 353–355.
    [34] Liu YH, Ott M, Goyal N, et al. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019.
    [35] Li XL, Liang P. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. ACL, 2021. 4582–4597.
    [36] Zaken EB, Goldberg Y, Ravfogel S. BitFit: Simple parameter-efficient fine-tuning for Transformer-based masked language-models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin: ACL, 2022. 1–9.
    [37] Houlsby N, Giurgiu A, Jastrzebski S, et al. Parameter-efficient transfer learning for NLP. Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019. 2790–2799.
    [38] Lin ZJ, Madotto A, Fung P. Exploring versatile generative language model via parameter-efficient transfer learning. arXiv:2004.03829, 2020.
    [39] Pfeiffer J, Kamath A, Rücklé A, et al. AdapterFusion: Non-destructive task composition for transfer learning. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. ACL, 2021. 487–503.
    [40] Rücklé A, Geigle G, Glockner M, et al. AdapterDrop: On the efficiency of adapters in Transformers. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: ACL, 2021. 7930–7946.
    [41] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners. OpenAI Blog, 2019, 1(8): 9.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

林志鹏,郭峥嵘,张伟志,郭躬德.基于奇异值分解的适应微调.计算机系统应用,2025,34(1):276-284

Copy
Share
Article Metrics
  • Abstract:104
  • PDF: 333
  • HTML: 138
  • Cited by: 0
History
  • Received:June 06,2024
  • Revised:July 10,2024
  • Online: November 25,2024
Article QR Code
You are the first992113Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063