基于仿射聚类的宏基因组序列物种聚类
作者:
基金项目:

国家自然科学基金面上项目(60970085)


Metagenomic DNA Sequence Binning based on Affinity Propagation
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [17]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着下一代测序技术的迅猛发展, 宏基因组学已经成为新的研究热点, 宏基因组学序列聚类问题使用无参考的方法, 对包含多个物种的宏基因组序列进行有效分离. 为此, 提出一种结合相似度信息和结构信息的宏基因组物种聚类算法, 并引入仿射聚类来进行序列物种聚类. 实验数据表明该方法聚类精度高、执行速度快. 我们也开发了基于该方法的宏基因组序列物种聚类软件.

    Abstract:

    Nowadays, with the rapid development of the next generation sequencing technologies, metagenomics have become a new hotspot,However research in metagenomics faces the issue of binning --- identification and taxonomic characterization of the NGS short reads. To solve this problem, this paper first analyzes the next generation sequencing technology characteristics, statistical characteristics of metagenomic sequence, then proposes a new clustering method for DNA sequence binning. Test results show that this method has a very good clustering accuracy. In the same time, we developed an software for metagenomic binning based on this algorithm MetaBinning.

    参考文献
    1 Mardis ER. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum, 2008, 9: 387-402.
    2 Wendl M, Waterston R. Generalized gap model for bacterial artificial chromosome clone fingerprint mapping and shotgun sequencing. Genome Res, 2002, 12(1): 1943-1949.
    3 Gill SR, Pop M, DeBoy RT. Metagenomic analysis of the human distal gut microbiome. Science, June 2006, 312: 1355-1359.
    4 McHardy A, Marln H, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length dna fragments. Nauture Methods, 2007, 4(1): 63-72.
    5 Sandberg R, Winberg G, Branden CI. Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Research, 2001, 11(8): 1404-1409.
    6 Diaz N, Krause L, Goesmann A. TACOA-Taxonomic classil-cation of environmental genomic fragments using a kerneliz-ed nearest neighbor approach. BMC Bioinformatics, 2009, 10(1): 56.
    7 Wu YW, Ye YZ. A novel abundance-based algori-thm for binning metagenomic sequences using l-tuples. Proc. of the 14th annual international conference (RECOMB'10). Springer. 2010. 535-549.
    8 Yang B. MetaCluster: unsupervised binning of environ-mental genomic fragments and taxonomic annotation. Proc. of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine(ACM-BCB). 2010. 170-179.
    9 Leung HCM, Yiu SM, Yang B. A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics, 2011, 27: 1489-1495.
    10 Chatterji S, Yamazaki I, Bai Z. Compostbin: a DNA composition-based algorithm for binning environmental shotgun reads. Proc. of the 12th annual international conference (RECOMB'08). Springer. 2008. 17-28.
    11 Tanaseichuk O, Borneman J, Jiang T. Separating metagenometic short reads into genomes via clustering. Proc. of the 11th Algorithms in Bioinformatics.(WABI). 2011. 298-313.
    12 Zhou F. Barcodes for genomes and applications. BMC Bioinformatics, 2008, 9(1): 546.
    13 Frey BJ, Dueck D. Clustering by passing messages between data points. Science, February 2007, 315: 972-976.
    14 王开军,张军英,李丹.自适应仿射传播聚类.自动化学报, 2007,12(33):1242-1245.
    15 许文竹,徐立鸿.基于仿射传播聚类的自适应关键帧提取.计算机科学,2010,12(1):268-270.
    16 Richter DC. MetaSim:a sequencing simulator for genomics and metagenomics, PLoS ONE,3,pp.e3373,2008.
    17 ftp://ftp.ncbi.nih.gov/ ftp://www.ncbi.nlm.nih.gov/.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

聂鹏宇,潘玮华,徐云.基于仿射聚类的宏基因组序列物种聚类.计算机系统应用,2013,22(11):165-170,142

复制
分享
文章指标
  • 点击次数:2187
  • 下载次数: 2885
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2013-04-22
  • 最后修改日期:2013-05-13
  • 在线发布日期: 2013-11-22
文章二维码
您是第11224692位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号