Algorithm Implementation of Variable Order Markov Model
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [17]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    It is of great significance how to model and mine historical data quickly and effectively. Based on the statistical characteristics of Markov model, this study designs and implements a variable order Markov model based on suffix array and suffix automata, in view of the limitations of the model in practical data mining applications. Based on the realization of suffix tree structure, the suffix chain is introduced to realize the quick jump of each state subsequence, and the requirement of different order length probability can be dynamically and adaptively calculated. The experimental results show that compared with the traditional Markov model, the model constructs the link between suffix sequence characteristics of probability and statistics of historical data and the state in linear time and space complexity, which can greatly reduce the storage space and time, and realize online learning and application of large data.

    Reference
    1 Wang BN, Hu YH, Shou GC, et al. Trajectory prediction in campus based on Markov chains. In:Wang Y, Yu G, Zhang YY, et al. eds. Big Data Computing and Communications. Cham:Springer, 2016.
    2 Illescas G, Martínez M, Mora-Soto A, et al. How to think like a data scientist:Application of a variable order Markov model to indicators management. In:Mejia J, Munoz M, Rocha Á, et al. eds. Trends and Applications in Software Engineering. Cham:Springer, 2016.
    3 Goreac D, Kobylanski M, Martinez M. A piecewise deterministic markov toy model for traffic/maintenance and associated Hamilton-Jacobi integrodifferential systems on networks. Applied Mathematics & Optimization, 2016, 74(2):375-421.
    4 Asahara A, Maruyama K, Sato A, et al. Pedestrian-movement prediction based on mixed Markov-chain model. Proceedings of the 19 th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. Chicago, IL, USA. 2011. 25-33.
    5 Gambs S, Killijian MO, del Prado Cortez MN. Next place prediction using mobility Markov chains. Proceedings of the 1st Workshop on Measurement, Privacy, and Mobility. Bern, Switzerland. 2012. 3.
    6 Rissanen J. A universal data compression system. IEEE Transactions on Information Theory, 1983, 29(5):656-664.[DOI:10.1109/TIT.1983.1056741]
    7 Ron D, Singer Y, Tishby N. Learning probabilistic automata with variable memory length. Proceedings of the 7 th Annual Conference on Computational Learning Theory. New Brunswick, NJ, USA. 1994. 35-46.
    8 Chin YS, Chen TL. Minimizing variable selection criteria by Markov chain Monte Carlo. Computational Statistics, 2016, 31(4):1263-1286.[DOI:10.1007/s00180-016-0649-3]
    9 Melikov AZ, Ponomarenko LA, Bagirova SA. Markov models of queueing-inventory systems with variable order size. Cybernetics and Systems Analysis, 2017, 53(3):373-386.[DOI:10.1007/s10559-017-9937-3]
    10 Nagata Y. Population diversity measures based on variable-order Markov models for the traveling salesman problem. Proceedings of the 14 th International Conference on Parallel Problem Solving from Nature. Edinburgh, UK. 2016. 973-983.
    11 Mao B, Cao J, Wu ZA, et al. Predicting driving direction with weighted Markov model. In:Zhou SG, Zhang SM, Karypis G, eds. Advanced Data Mining and Applications. Berlin Heidelberg:Springer, 2012. 407-418.
    12 Chen M, Liu Y, Yu XH. Predicting next locations with object clustering and trajectory clustering. In:Cao T, Lim EP, Zhou ZH, et al. eds. Advances in Knowledge Discovery and Data Mining. Cham:Springer, 2015. 344-356.
    13 Bejerano G, Yona G. Variations on probabilistic suffix trees:Statistical modeling and prediction of protein families. Bioinformatics, 2001, 17(1):23-43.[DOI:10.1093/bioinformatics/17.1.23]
    14 Leonardi FG. A generalization of the PST algorithm:Modeling the sparse nature of protein sequences. Bioinformatics, 2006, 22(11):1302-1307.[DOI:10.1093/bioin formatics/btl088]
    15 Apostolico A, Bejerano G. Optimal amnesic probabilistic automata or how to learn and classify proteins in linear time and space. Journal of Computational Biology, 2004, 7(3-4):381-393.
    16 Lin J, Adjeroh D, Jiang BH. Probabilistic suffix array:Efficient modeling and prediction of protein families. Bioinformatics, 2012, 28(10):1314-1323.[DOI:10.1093/bioin formatics/bts121]
    17 Lothaire M. Applied Combinatorics on Words. Cambridge:Cambridge University Press, 2005.
    Related
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王兴,吴艺,林劼,卓一帆.变阶马尔科夫模型算法实现.计算机系统应用,2018,27(4):10-17

Copy
Share
Article Metrics
  • Abstract:2057
  • PDF: 3133
  • HTML: 1562
  • Cited by: 0
History
  • Received:August 17,2017
  • Revised:September 15,2017
  • Online: April 03,2018
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063