基于压缩感知和音频指纹的固定音频检索方法
作者:
基金项目:

国家自然科学基金(61971015)


Specific Audio Retrieval Method Based on Compressed Sensing and Audio Fingerprint
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    针对现有音频检索中样本音频特征库数据量较大且检索速率慢问题, 本文提出一种基于压缩感知和音频指纹降维的固定音频检索方法. 在音频检索的训练阶段, 首先, 对样本音频信号进行稀疏化处理, 并通过压缩感知算法对稀疏化后的音频数据进行压缩; 其次, 提取压缩信号的音频指纹; 再次, 引入音频指纹离散基尼系数通过计算音频指纹各维度的离散基尼系数对指纹实施降维, 最终得到检索特征库. 在音频检索阶段用和训练阶段相同的算法提取待检音频的特征与音频特征库数据匹配得出检索结论. 实验结果表明, 所提音频检索方法在确保较好的检索准确率的基础上, 大幅度减小了样本音频数据库的存储量, 提高了音频的检索速率.

    Abstract:

    In order to solve the problem of large amount of data and slow retrieval speed in the existing audio retrieval, a fixed audio retrieval method is proposed in this study based on compressed sensing and audio fingerprint dimensionality reduction. In the training stage of audio retrieval, the sample audio signal is sparse processed, and the sparse audio data is compressed by the compression sensing algorithm, then the audio fingerprint is extracted, and then the audio fingerprint discrete Gini coefficient is introduced to reduce the dimension of the fingerprint by calculating the discrete Gini coefficient of each dimension of the audio fingerprint. In the recognition stage of audio retrieval, we use the same algorithm as in the training stage to process the audio to be tested and match with the sample audio fingerprint. The experimental results show that the proposed audio retrieval method greatly reduces the storage of the sample audio database and improves the audio retrieval speed on the basis of ensuring a better retrieval accuracy.

    参考文献
    [1] 张卫强, 刘加. 网络音频数据检索技术. 通信学报, 2007, 28(12): 152–155. [doi: 10.3321/j.issn:1000-436x.2007.12.026
    [2] 张卫强, 刘加, 陈恩庆. 一种基于仿生模式识别思想的固定音频检索方法. 自然科学进展, 2008, 18(7): 808–813. [doi: 10.3321/j.issn:1002-008X.2008.07.013
    [3] Doidge AN, Evans LH, Herron JE, et al. Separating content-specific retrieval from post-retrieval processing. Cortex, 2017, 86: 1–10. [doi: 10.1016/j.cortex.2016.10.003
    [4] Kashino K, Kurozumi T, Murase H. A quick search method for audio and video signals based on histogram pruning. IEEE Transactions on Multimedia, 2003, 5(3): 348–357. [doi: 10.1109/TMM.2003.813281
    [5] Kim KM, Kim SY, Jeon JK, et al. Quick audio retrieval Using multiple feature vectors. IEEE Transactions on Consumer Electronics, 2006, 52(1): 200–205. [doi: 10.1109/TCE.2006.1605048
    [6] 齐晓倩, 陈鸿昶, 黄海. 基于K-L距离的两步固定音频检索方法. 计算机工程, 2011, 37(19): 160–162. [doi: 10.3969/j.issn.1000-3428.2011.19.052
    [7] Tzanetakis G, Cook P. Music analysis and retrieval systems for audio signals. Journal of the American Society for Information Science and Technology, 2004, 55(12): 1077–1083. [doi: 10.1002/asi.20060
    [8] Tian L, Song QH, Lu XS. Information technology and an audio retrieval method based on a novel audience rating system. Advanced Materials Research, 2014, 886: 664–667. [doi: 10.4028/www.scientific.net/AMR.886.664
    [9] Haitsma J, Kalker T. A highly robust audio fingerprinting system. Proceedings of the 3rd International Conference on Music Information Retrieval. Paris, France. 2002. 107–115.
    [10] 王晖楠, 魏娇. 基于人工智能识别的音乐片段指纹检索技术研究. 自动化与仪器仪表, 2019, (5): 119–122, 126
    [11] Yao SS, Niu BN, Liu JQ. Audio identification by sampling sub-fingerprints and counting matches. IEEE Transactions on Multimedia, 2017, 19(9): 1984–1995. [doi: 10.1109/TMM.2017.2723846
    [12] 于云, 周伟栋. 基于压缩感知的鲁棒性说话人识别参数研究. 计算机技术与发展, 2016, 26(3): 18–22. [doi: 10.3969/j.issn.1673-629X.2016.03.005
    [13] Son W, Cho HT, Yoon K, et al. Sub-fingerprint masking for a robust audio fingerprinting system in a real-noise environment for portable consumer devices. IEEE Transactions on Consumer Electronics, 2010, 56(1): 156–160. [doi: 10.1109/TCE.2010.5439139
    [14] Donoho DL. Compressed sensing. IEEE Transactions on Information Theory, 2006, 52(4): 1289–1306. [doi: 10.1109/TIT.2006.871582
    [15] 李秀梅, 吕军. 基于压缩感知的信号时频表示重构. 计算机系统应用, 2016, 25(7): 176–181. [doi: 10.15888/j.cnki.csa.005239
    [16] 王蓉芳, 焦李成, 刘芳, 等. 利用纹理信息的图像分块自适应压缩感知. 电子学报, 2013, 41(8): 1506–1514. [doi: 10.3969/j.issn.0372-2112.2013.08.009
    [17] University of Iowa Electronic Music Studios. University of Iowa musical instrument samples. http://theremin.music.uiowa.edu/MIS.html.
    [18] Jia MS, Yang ZY, Bao CC, et al. Encoding multiple audio objects using intra-object sparsity. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(6): 1082–1095. [doi: 10.1109/TASLP.2015.2419980
    [19] 叶蕾, 杨震, 王天荆, 等. 行阶梯观测矩阵、对偶仿射尺度内点重构算法下的语音压缩感知. 电子学报, 2012, 40(3): 429–434. [doi: 10.3969/j.issn.0372-2112.2012.03.003
    相似文献
    引证文献
引用本文

赵文兵,贾懋珅,王琪.基于压缩感知和音频指纹的固定音频检索方法.计算机系统应用,2020,29(8):165-172

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-01-19
  • 最后修改日期:2020-02-02
  • 在线发布日期: 2020-07-31
  • 出版日期: 2020-08-15
文章二维码
您是第11204797位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号