Analysis of Online Consumer Behavior - Design of CRISP-DM Process Model

DOI 10.7160/aol.2020.120302
No 3/2020, September
pp. 13-22

Exenberger, E. and Bucko, J. (2020) “Analysis of Online Consumer Behavior - Design of CRISP-DM Process Model", AGRIS on-line Papers in Economics and Informatics, Vol. 12, No. 3, pp. 13-22. ISSN 1804-1930. DOI 10.7160/aol.2020.120302.


The basis of the modern marketing of a business entity is to know the behavior of its customers. Advanced artificial intelligence methods, such as data mining and machine learning methods, penetrate data analysis. The application of these methods is most appropriate in the case of online sales of any goods in large quantities and various industries. They are very often used in the sale of electronics, PCs or clothes. However, it is also possible to apply them to the agricultural industry, not only in B2C, but also in B2B in the sale of seeds, agricultural products, or agricultural machinery. Appropriate combinations of offers and knowledge of customers can bring the selling entity higher profits or competitive advantages. The main goal of our study is to design a CRISP-DM process model that will enable small businesses to analyze online customers' behavior. To reach the main goal we perform a data analysis of the online sales data by using machine learning methods as clustering, decision tree and association rules mining. After evaluating the proposed model, we discuss its use of the proposed model in the field of internet sales in the agricultural sector.


Classification, association rules, data analysis, consumer behavior, online shopping.


  1. Adamov, Abzetdin Z. (2018) “Mining Term Association Rules from Unstructured Text in Azerbaijani Language”, In 2018 IEEE 12th International Conference on Application of Information and Communication Technologies (AICT), IEEE, pp. 1-4. DOI 10.1109/ICAICT.2018.8747143.
  2. Alfian, G., Ijaz, M. F., Syafrudin, M., Syaekhoni, A., Fitriyani, N. L. and Rhee, J. (2019) “Customer Behavior Analysis Using Real-Time Data Processing: A Case Study of Digital Signage-Based Online Stores”, Asia Pacific Journal of Marketing and Logistics, Vol. 31, No. 1, pp. 265-290. ISSN 1355-5855. DOI 10.1108/APJML-03-2018-0088.
  3. Askari, S., Md. S. and Hussain, Md. A. (2020) “E-Transactional Fraud Detection Using Fuzzy Association Rule Mining”, Proceedings of the 2nd International Conference on Information Systems & Management Science (ISMS) 2019, Tripura University, Agartala, Tripura, India, 6 p.
  4. Avcilar, M. Y. and Emre, Y. (2014) “Association Rules in Data Mining: An Application on a Clothing and Accessory Specialty Store”, Canadian Social Science, Vol. 10, No. 3, pp. 75-83. E-ISSN 1923-6697, ISSN 1712-8056.
  5. Becker, R. A. (2018) "The New S Language: A Programming Environment for Data Analysis and Graphics", CRC Press. ISBN 053409192X. DOI 10.1201/9781351074988.
  6. Borgelt, Ch. and Kruse, R. (2002) “Induction of Association Rules: Apriori Implementation”, In: Härdle W., Rönz B. (eds) Compstat, Physica, Heidelberg. E-ISBN 978-3-642-57489-4. DOI 10.1007/978-3-642-57489-4_59.
  7. Borgelt, Ch. (2003) “Efficient Implementations of Apriori and Eclat”, In Proc. 1st IEEE ICDM Workshop on Frequent Item Set Mining Implementations, FIMI 2003, Melbourne, FL, CEUR Workshop Proceedings 90.
  8. Brachman, R. J. and Anand, T. (1996) “The Process of Knowledge Discovery in Databases”, In Advances in Knowledge Discovery and Data Mining, pp. 37-57. ISBN 9780262560979.
  9. Breiman, L. (2017) "Classification and Regression Trees", Routledge. ISBN 1138469521. DOI 10.1201/9781315139470.
  10. Buczak, A. L., Baugher, B., Guven, E., Ramac-Thomas, L. C., Elbert, Y., Babin, S. M. and Lewis, S. H. (2015) “Fuzzy Association Rule Mining and Classification for the Prediction of Malaria in South Korea”, BMC Medical Informatics and Decision Making, Vol. 15, No. 1, pp. 47. ISSN 1472-6947. DOI 10.1186/s12911-015-0170-6.
  11. Chambers, J. M. (2017) "Graphical Methods for Data Analysis", Chapman and Hall/CRC, 410 p. ISBN 9781315893204.
  12. Charrad, M. and Ghazzali, N., Boiteau, V. And Niknafs, A. (2014) “NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set”, Journal of Statistical Software. ISSN 1548-7660.
  13. Chen, Ch.-Ch., Huang, T.-Ch., Park, J. J. and Yen, N. Y. (2015) “Real-Time Smartphone Sensing and Recommendations towards Context-Awareness Shopping”, Multimedia Systems, Vol. 21, No. 1, pp. 61-72. E-ISSN 1432-1882, ISSN 0942-4962. DOI 10.1007/s00530-013-0348-7.
  14. Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996) “The KDD Process for Extracting Useful Knowledge from Volumes of Data”, Communications of the ACM, Vol. 39, No. 11, pp. 27-34. E-ISSN 1557-7317, ISSN 0001-0782. DOI 10.1145/240455.240464.
  15. Gandhi, N. and Armstrong, L. J. (2016) December. A review of the application of data mining techniques for decision making in agriculture. In 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), IEEE, pp. 1-6. DOI 10.1109/IC3I.2016.7917925.
  16. Guo, Y., Wang, M. and Li, X. (2017) “Application of an Improved Apriori Algorithm in a Mobile E-Commerce Recommendation System”, Industrial Management & Data Systems, Vol. 117, No. 2, pp. 287-303. ISSN 0263-5577. DOI 10.1108/IMDS-03-2016-0094.
  17. Hartigan, J. A. and Wong, M. A. (1979) “Algorithm AS 136: A k-Means Clustering Algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1, pp. 100-108. E-ISSN 14679876, ISSN 00359254. DOI 10.2307/2346830.
  18. Kaur, M. and Kang, S. (2016) “Market Basket Analysis: Identify the Changing Trends of Market Data Using Association Rule Mining”, Procedia Computer Science, Vol. 85, pp. 78-85. ISSN 1877-0509. DOI 10.1016/j.procs.2016.05.180.
  19. Keller, J. M, Gray, M. R. and Givens, J. A. (1985) “A Fuzzy K-Nearest Neighbor Algorithm”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-15, No. 4, pp. 580-585. ISSN 21682216. DOI 10.1109/TSMC.1985.6313426.
  20. Klösgen, W. and Zytkow, J. M. (2002) “The Knowledge Discovery Process”, In Handbook of Data Mining and Knowledge Discovery, 10-21 p. ISBN 978-0-387-09823-4.
  21. Komprdová, K. (2012) "Rozhodovací stromy a lesy", Akademické nakladatelství CERM, 98 p., ISBN 978-80-7204-785-7.
  22. Kumar, M. S. and Balakrishnan, K. (2019) "Development of a Model Recommender System for Agriculture Using Apriori Algorithm", In Cognitive Informatics and Soft Computing, pp. 153-163, Springer, Singapore. ISBN 978-981-15-1451-7.
  23. Kunjachan, H., Hareesh, M. J. and Sreedevi, K. M. (2018) “Recommendation Using Frequent Itemset Mining in Big Data”, In 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 561–556. ISBN 9781538628430. DOI 10.1109/ICCONS.2018.8662905.
  24. Lepping, J. (2018) “Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery”. John Wiley and Sons Inc. E-ISSN 1942-4795.
  25. Luo, D., Xiao, Ch., Zheng, G., Sun, S., Wang, M., He, X. and Lu, A. (2013) “Searching Association Rules of Traditional Chinese Medicine on Ligusticum Wallichii by Text Mining”, In 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE., pp. 162-167. ISBN 978-1-4799-1309-1. DOI 10.1109/BIBM.2013.6732664.
  26. Ma, Haiying, and Dong Gang. (2011) “Customer Segmentation for B2C E-Commerce Websites Based on the Generalized Association Rules and Decision Tree”, In 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), pp. 4600-4603, Piscataway, NJ : IEEE. ISBN 9781457705359. DOI 10.1109/AIMSEC.2011.6010255.
  27. MacQueen, J. (1967) “Some Methods for Classification and Analysis of Multivariate Observations”, In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA. Vol. 1, pp. 281-297.
  28. Mannila, H. (1997) “Methods and Problems in Data Mining”, In International Conference on Database Theory, pp. 41-55. ISBN 3540622225. DOI 10.1007/3-540-62222-5_35.
  29. Murrell, P. (2018) "R Graphics", CRC Press.
  30. Parsad, Ch., Vijay, T. S. and Prashar, S. (2018) “Predicting Online Buying Behaviour-a Comparative Study Using Three Classifying Methods”, International Journal of Business Innovation and Research, Vol. 15, No. 1, pp. 62-78. E-ISSN 1751-0260, ISSN 1751-0252. DOI 10.1504/IJBIR.2018.10009022.
  31. Quinlan, J. R. (1986) “Induction of Decision Trees”, Machine Learning, Vol. 1, No. 1, pp. 81-106. E-ISSN 1573-0565, E-ISSN 0885-6125. DOI 10.1007/BF00116251.
  32. Rakesh, A. and Srikant, R. (1994) “Fast Algorithms for Mining Association Rules”, In VLDB '94: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487-99.
  33. Rakesh, A., Imieliński, T. and Swami, A. (1993) “Mining Association Rules between Sets of Items in Large Databases”, In Acm Sigmod Record, Vol. 22, pp. 207-16. DOI 10.1145/170036.170072.
  34. Sahil, R. and Singh, D. (2016) “Impact of Demographic Factors on Online Purchase Frequency - A Decision Tree Approach”, In 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). ISBN 9789380544205.
  35. Safri, Y. F., Arifudin, R. and Muslim, M. A. (2018) “K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor”, Scientific Journal of Informatics, Vol. 5, No. 1, pp. 18. E-ISSN 2460-0040, ISSN 2407-7658. DOI 10.15294/sji.v5i1.12057.
  36. Shedthi, B. S., Shetty, S. and Siddappa, M. (2017) "Implementation and comparison of K-means and fuzzy C-means algorithms for agricultural data", In 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), IEEE, pp. 105-108. ISBN 978-981-15-0146-3. DOI 10.1109/ICICCT.2017.7975168.
  37. Simoudis, E. (1996) “Reality Check for Data Mining”, IEEE Expert, Vol. 11, No. 5, pp. 26-33. ISSN 1541-1672. DOI 10.1109/64.539014.
  38. Smart Vision Europe (2015) “About CRISP-DM”, CRISPDM by Smart Vision Europe. [Online]. Available:, [Accessed: 20 July, August].
  39. Soni, J., Ansari, U., Sharma, D. and Soni, S. (2011) “Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction”, International Journal of Computer Applications, Vol. 17, No. 8, pp. 43-48. ISSN 0975-8887. DOI 10.5120/2237-2860.
  40. Suchacka, G., Skolimowska-Kulig, M. and Potempa, A. (2015) “A K-Nearest Neighbors Method for Classifying User Sessions in e-Commerce Scenario”, Journal of Telecommunications and Information Technology, Vol. 3, pp. 64-69. E-ISSN 1899-8852, ISSN 1509-4553.
  41. Sun, P., Cárdenas, D. A. and Harrill, R. (2016) “Chinese Customers’ Evaluation of Travel Website Quality: A Decision-Tree Analysis”, Journal of Hospitality Marketing & Management, Vol. 25, No. 4, pp. 476-497. E-ISSN 1936-8631, ISSN 1936-8623. DOI 10.1080/19368623.2015.1037977.
  42. Wu, H., Lu, Z., Pan, L., Xu, R. and Jiang, W. (2009) “An Improved Apriori-Based Algorithm for Association Rules Mining”, In 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, Tianjin, China, Vol. 6, pp. 51-55. DOI 10.1109/FSKD.2009.193.
  43. Xiuli, Y. (2017) “An Improved Apriori Algorithm for Mining Association Rules”, In AIP Conference Proceedings, AIP Publishing LLC. DOI 10.1063/1.4977361.
  44. Zhao, B., Song, Z., Mao, W., Mao, E. and Zhang, X., (2009) "Agriculture extra-green image segmentation based on particle swarm optimization and k-means clustering", Transactions of the Chinese Society for Agricultural Machinery, Vol. 40, No. 8, pp.166-169. ISSN 1000-1298.

Full paper

  Full paper (.pdf, 704.37 KB).