本文已被:浏览 2791次 下载 4058次
Received:April 25, 2017 Revised:May 11, 2017
Received:April 25, 2017 Revised:May 11, 2017
中文摘要: 特征向量的构造是蛋白质二级结构预测的一个关键问题. 现有的研究方法,通常只使用BLOSUM62进化矩阵生成PSSM矩阵,对蛋白质进化过程中存在的氨基酸残基突变现象缺乏考虑. 本文提出利用多重进化矩阵构造蛋白质特征向量,其融合了不同进化时间的PSSM矩阵,不仅能够很好地反映序列中氨基酸的位置信息,而且能够反映序列进化过程中氨基酸位点发生突变产生的影响. 本文通过组合不同进化程度的矩阵来构造特征向量,选用逻辑回归、随机森林和多分类支持向量机三种分类算法作为预测工具,利用网格搜索法和交叉实验法优化参数,在RS126、CB513和25PDB公用数据集上进行了若干组实验. 对比实验结果表明,本文所提出基于多重进化矩阵的蛋白质特征向量构造方法能够有效提高蛋白质二级结构的预测精度.
Abstract:The construction of feature vector is a key issue for protein secondary structure prediction. In the present methods, only the BLOSUM62 matrix is taken into account, which neglects the amino acid mutation of protein in the evolutionary process. In this study, we propose to construct feature vector by combining PSSM matrices of different evolutionary times, which cannot only reflect the position information, but also reflect the interaction of amino acids. Based on the feature vector, logistics, randomforest and M-SVMCS models are utilized to predict protein secondary structure on the public datasets (RS126, CB513, and 25PDB). The experimental result demonstrates that the method can achieve a better performance than traditional methods.
keywords: protein secondary structure prediction multiple evolutionary matrix logistics randomforest M-SVMCS
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61375013,61502259);山东省自然科学基金(ZR2013FM020)
引用文本:
杜月寒,鹿文鹏,刘毅慧,成金勇.基于多重进化矩阵的蛋白质特征向量构造方法.计算机系统应用,2018,27(2):180-185
DU Yue-Han,LU Wen-Peng,LIU Yi-Hui,CHENG Jin-Yong.Protein Secondary Structure Prediction Based on Multiple Evolutionary Matrix.COMPUTER SYSTEMS APPLICATIONS,2018,27(2):180-185
杜月寒,鹿文鹏,刘毅慧,成金勇.基于多重进化矩阵的蛋白质特征向量构造方法.计算机系统应用,2018,27(2):180-185
DU Yue-Han,LU Wen-Peng,LIU Yi-Hui,CHENG Jin-Yong.Protein Secondary Structure Prediction Based on Multiple Evolutionary Matrix.COMPUTER SYSTEMS APPLICATIONS,2018,27(2):180-185