Improved VSM Algorithm in Species Identification Based on 16S rRNA Gene Sequences
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [24]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    In the field of species identification, the traditional algorithm is based on the BLAST method, which is regarded as the authoritative method, but the method has a series of problems such as complex calculating process, time-consuming, as well as space-consuming. In this study, we propose an improved VSM algorithm based on K-String compositional vector method, and give the alternative norm-format formula in calculating the genetic distance between species in the Banach space for the reference of other scientific researchers. In this study, the computational efficiency and the result of the species identification are the two aspects to determine the properties of the improved method. The conclusion is that the calculating time of improved VSM algorithm based on 2-norm has decreased obviously than that of the BLAST algorithm, in addition, the result of classification demonstrates good consistence and convergence with the comparison result in terms of detection rate.

    Reference
    1 冯思玲. 系统发育树构建方法研究. 信息技术, 2009, (6):38-40.[doi:10.3969/j.issn.1671-3176.2009.06.018]
    2 张会敏, 冯友军. 一株野生细菌的16Sr DNA序列分析与系统发育树的构建. 生物信息学, 2005, 3(1):1-4.[doi:10.3969/j.issn.1672-5565.2005.01.001]
    3 王章群, 解增言, 蔡应繁, 等. 系统发育基因组学研究进展. 遗传, 2014, 36(7):669-678.
    4 李强, 左光宏, 郝柏林. 从完全基因组出发建立原核生物亲缘关系和分类系统时遇到的数学问题. 中国科学:物理学力学天文学, 2014, 12:007.
    5 何亮, 谢小军, 王冲, 等. 16S rRNA序列同源性分析结合系统发育树构建鉴定6株生殖道乳杆菌. 中华全科医学, 2013, 11(4):617-618.
    6 王颜颜, 夏茂宁, 欧维正, 等. 16S rRNA和secA1基因构建临床诺卡菌的系统发育树比较. 贵阳医学院学报, 2017, 42(4):409-415.
    7 Liu J, Wang H, Yang H, et al. Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms. Nucleic Acids Research, 2012:gks828.[doi:10.1093/nar/gks828]
    8 Chu KH, Li CP, Qi J. Ribosomal RNA as molecular barcodes:a simple correlation analysis without sequence alignment. Bioinformatics, 2006, 22(14):1690-1701.[doi:10.1093/bioinformatics/btl146]
    9 Chu KH, Xu M, Li CP. Rapid DNA barcoding analysis of large datasets using the composition vector method. BMC Bioinformatics, 2009, 10(S14):S8.
    10 任清福, 孙清岚, 马俊才. 基于16S rRNA基因序列分析的物种辅助分类研究与实现. 科研信息化技术与应用, 2015, 6(5):48-57.
    11 Qi J, Wang B, Hao BI. Whole proteome prokaryote phylogeny without sequence alignment:a K-string composition approach. Journal of Molecular Evolution, 2004, 58(1):1-11.[doi:10.1007/s00239-003-2493-7]
    12 Sinclair L, Osman OA, Bertilsson S, et al. Microbial community composition and diversity via 16S rRNA gene amplicons:evaluating the illumina platform. Plos One, 2015, 10(2):e0116955.[doi:10.1371/journal.pone.0116955]
    13 Sun Y, Cai Y, Huse SM, et al. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Briefings in Bioinformatics, 2012, 13(1):107-121.[doi:10.1093/bib/bbr009]
    14 Thomas T, Gilbert J, Meyer F. Metagenomics - a guide from sampling to data analysis. Microbial Informatics and Experimentation, 2012, 2(1):3.[doi:10.1186/2042-5783-2-3]
    15 Hao BL, Gao L. Prokaryotic branch of the tree of life:a composition vector approach. International Journal of Systematic and Evolutionary Microbiology, 2008, 46:258-262.
    16 Cui H, Zhang X. Alignment-free supervised classification of metagenomes by recursive SVM. BMC Genomics, 2013, 14(1):1-12.[doi:10.1186/1471-2164-14-1]
    17 Daniel MD, Price MN, Julia G, et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. Isme Journal, 2012, 6(3):610-618.[doi:10.1038/ismej.2011.139]
    18 Wang H, Xu Z, Gao L, et al. A fungal phylogeny based on 82 complete genomes using the composition vector method. BMC Evolutionary Biology, 2009, 9:1471-2148.
    19 Chu KH, Qi J, Yu ZG, et al. Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes. Molecular Biology and Evolution, 2004, 21:200-206.
    20 Liu J, Wang H, Yang H, et al. Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms. Nucleic Acids Research, 2013, 41:1-10.[doi:10.1093/nar/gks1039]
    21 徐浩广, 王宁, 刘佳明, 等. 基于自然语言检索的综合相似度计算算法. 计算机系统应用. 2017, 26(6):170-175[doi:10.15888/j.cnki.csa.005815]
    22 Carrera-Trejo JV, Sidorov G, Miranda-Jiménez S, et al. Latent dirichlet allocation complement in the vector space model for multi-label text classification. Cancer Biology & Therapy, 2015, 7(7):1095-7.
    23 Daniel MD, Price MN, Julia G, et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME Journal, 2012, 6(3):610-618.[doi:10.1038/ismej.2011.139]
    24 Grossi De SMF, Guimaraes LM, Batista JAN, et al. Compositions and methods for modifying gene expression using the promoter of ubiquitin conjugating protein coding gene of soybean plants. US, US9012720. 2015.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

祝斌,亓合媛,马俊才.基于16S rRNA序列物种鉴定的改进向量空间模型算法.计算机系统应用,2018,27(9):163-169

Copy
Share
Article Metrics
  • Abstract:2406
  • PDF: 2455
  • HTML: 1276
  • Cited by: 0
History
  • Received:February 01,2018
  • Revised:February 28,2018
  • Online: August 17,2018
Article QR Code
You are the first990571Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063