UX and Machine Learning – Preprocessing of Audiovisual Data Using Computer Vision to Recognize UI Elements

DOI 10.7160/aol.2023.150304
No 3/2023, September
pp. 35-44

Čejka, M., Masner, J., Jarolímek, J., Benda, P., Prokop, M., Šimek, P. and Šimek, P. (2023) "UX and Machine Learning – Preprocessing of Audiovisual Data Using Computer Vision to Recognize UI Elements", AGRIS on-line Papers in Economics and Informatics, Vol. 15, No. 3, pp. 35-44. ISSN 1804-1930. DOI 10.7160/aol.2023.150304.


This study explores the convergence of user experience (UX) and machine learning, particularly employing computer vision techniques to preprocess audiovisual data to detect user interface (UI) elements. With an emphasis on usability testing, the study introduces a novel approach for recognizing changes in UI screens within video recordings. The methodology involves a sequence of steps, including form prototype creation, laboratory experiments, data analysis, and computer vision tasks. The future aim is to automate the evaluation of user behavior during UX testing. This innovative approach is relevant to the agricultural domain, where specialized applications for precision agriculture, subsidy requests, and production reporting demand streamlined usability. The research introduces a frame extraction algorithm that identifies screen changes by analyzing pixel differences between consecutive frames. Additionally, the study employs YOLOv7, an efficient object detection model, to identify UI elements within the video frames. Results showcase successful screen change detection with minimal false negatives and acceptable false positives, showcasing the potential for enhanced automation in UX testing. The study’s implications lie in simplifying analysis processes, enhancing insights for design decisions, and fostering user-centric advancements in diverse sectors, including precision agriculture.


Usability, UX, audiovisual data, computer vision, frame extraction, object detection, YOLOv7, precision agriculture.


  1. Aviz, I. L., Souza, K. E., Ribeiro, E., de Mello Junior, H. and da R. Seruffo, M. C. (2019) "Comparative study of user experience evaluation techniques based on mouse and gaze tracking", WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, New York, NY, USA: ACM, pp. 53-56. ISBN 978-1-4503-6763-9. DOI 10.1145/3323503.3360623.
  2. Batch, A., Ji, Y, Fan, M., Zhao, J. and Elmqvist, N. (2023) "uxSense: Supporting User Experience Analysis with Visualization and Computer Vision", IEEE Transactions on Visualization and Computer Graphics, pp. 1-15. ISSN 1077-2626 DOI 10.1109/TVCG.2023.3241581.
  3. Bulanon, D. M., Hestand, T., Nogales, C., Allen, B. and Colwell, J. (2020) "Machine Vision System for Orchard Management", In: Sergiyenko, O., Flores-Fuentes, W., Mercorelli, P. (eds) "Machine Vision and Navigation", Springer, Cham., pp. 197-240. E-ISBN 978-3-030-22587-2. ISBN 978-3-030-22586-5 DOI 10.1007/978-3-030-22587-2_7.
  4. Harms, P. (2019) "Automated Usability Evaluation of Virtual Reality Applications", ACM Transactions on Computer-Human Interaction, Vol. 26, No. 3, pp. 1-36. ISSN 1073-0516 DOI 10.1145/3301423.
  5. Harrison, B. L. and Baecker R. M. (1992) "Designing Video Annotation and Analysis Systems", Proceedings of the Conference on Graphics Interface ’92, Vancouver, British Columbia, Canada: Morgan Kaufmann Publishers Inc., pp. 157-166. ISBN 0969533810
  6. Koonsanit, K. and Nishiuchi, N. (2021) "Predicting Final User Satisfaction Using Momentary UX Data and Machine Learning Techniques", Journal of Theoretical and Applied Electronic Commerce Research, Vol. 16, No. 7, pp. 3136-3156. ISSN 0718-1876 DOI 10.3390/jtaer16070171.
  7. Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L. and Dollár, P. (2014) "Microsoft COCO: Common Objects in Context", CoRR, abs/1405.0312. [Online]. Available: http://arxiv.org/abs/1405.0312. [Accessed: May 17, 2023]
  8. Mavridou, E. , Vrochidou, E, Papakostas, G. A., Pachidis, T. and Kaburlasos, V. G. (2019) "Machine Vision Systems in Precision Agriculture for Crop Farming", Journal of Imaging, Vol. 5, No. 12, 89 p. ISSN 2313-433X DOI 10.3390/jimaging5120089.
  9. Novák, J. Š., Masner, J., Benda, P., Šimek, P. and Merunka, V. (2023) "Eye Tracking, Usability, and User Experience: A Systematic Review", International Journal of Human–Computer Interaction, pp. 1-17. ISSN 1044-7318 DOI 10.1080/10447318.2023.2221600.
  10. Patrício, D. I. and Rieder, R. (2018) "Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review", Computers and Electronics in Agriculture, Vol. 153, pp. 69-81. ISSN 0168-1699 DOI 10.1016/j.compag.2018.08.001.
  11. Shultz, T. R. et al. (2011) "Confusion Matrix", in Sammut, C., Webb, G.I. (eds) "Encyclopedia of Machine Learning". Boston, MA: Springer US, pp. 209-209. E-ISBN 978-0-387-30164-8, ISBN 978-0-387-30768-8 DOI 10.1007/978-0-387-30164-8_157.
  12. Wang, A., Zhang, W. and Wei, X. (2019) "A review on weed detection using ground-based machine vision and image processing techniques", Computers and Electronics in Agriculture, Vol. 158, pp. 226-240. ISSN 0168-1699 DOI 10.1016/j.compag.2019.02.005.
  13. Wang, C.-Y., Bochkovskiy, A. and Liao, H.-Y. M. (2022) "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors", arXiv preprint arXiv:2207.02696 [Preprint] DOI 10.1109/CVPR52729.2023.00721.
  14. Wang, Z., Bovik, A. C., Sheikh, H. R. and Simoncelli, E. P. (2004) "Image Quality Assessment: From Error Visibility to Structural Similarity", IEEE Transactions on Image Processing, Vol. 13, No. 4, pp. 600-612. ISSN 1057-7149 DOI 10.1109/TIP.2003.819861.
  15. Wang, Z. and Bovik, A. C. (2009) "Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures", IEEE Signal Processing Magazine, Vol. 26, No. 1, pp. 98-117. ISSN 1053-5888. DOI 10.1109/MSP.2008.930649.

Full paper

  Full paper (.pdf, 1.27 MB).