MODAL GRAPH TRANSFORMER FOR GROUP ENGAGEMENT IN ONLINE MULTIPARTY CONVERSATIONS
Keywords:
hop-level positional encoding, Graph Transformer, group engagement, multimodal learning, social signal processing, collaborative learningAbstract
Online multiparty conversations such as virtual classrooms, remote collaboration meetings, and live discussions pose unique challenges for understanding group engagement. Traditional models typically rely on isolated participant features or unimodal data, failing to capture the rich, relational dynamics across modalities. In this work, we propose a novel approach that models group engagement as a dynamic, multimodal graph learning problem. Our framework introduces a Multimodal Graph Transformer (MGT) that combines audio-visual fusion with structure-aware graph attention. Each participant is represented as a graph node enriched with fused video and speech embeddings, while edges capture interaction intensity through gaze, turn-taking, and vocal overlap. To preserve graph structure, we incorporate hop-level positional encodings and restrict attention to top-k neighbors for scalable relational modeling. The architecture is designed to capture both localized interaction cues and global group dynamics without requiring pre-defined templates or scripted behavior. By integrating techniques from graph representation learning, multimodal attention, and social signal processing, our method offers a generalizable and theoretically grounded framework for engagement estimation in complex multiparty scenarios.
References
T. Peng, Q. Yue, Y. Liang, J. Ren, J. Luo, H. Yuan, and W. Wu, “CLGT: A Graph Transformer for Student Performance Prediction in Collaborative Learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
T. Peng, Q. Yue, Y. Liang, J. Ren, J. Luo, H. Yuan, and W. Wu, “GOAT: A Novel Global-Local Optimized Graph Transformer Framework for Predicting Student Performance in Collaborative Learning,” Scientific Reports, vol. 15, no. 1, p. 9861, 2025.
C. Yuan, K. Zhao, and Z. Wu, “A Survey of Graph Transformers: Architectures, Theories and Applications,” IEEE Transactions on Neural Networks and Learning Systems, 2025.
K. Holstein, B. M. McLaren, and V. Aleven, “Student learning benefits of a real-time dashboard for collaborative learning: Evidence from a classroom study,” International Journal of Artificial Intelligence in Education, vol. 30, no. 1, pp. 1171–1195, 2020.
Y. Li, Y. Zhao, J. Wang, and J. He, “Study-GNN: Multi-topology fusion graph neural network for academic performance prediction,” Knowledge-Based Systems, vol. 240, p. 108003, 2022.
M. Karimi and S. Salavati, “Graph-based student modeling using relational GCN for performance prediction,” Computers & Education, vol. 157, p. 103983, 2020.
R. Huang and Z. Zeng, “Dual-GNN: Dual Graph Neural Network for collaborative learning outcome prediction,” IEEE Transactions on Learning Technologies, vol. 17, no. 2, pp. 210–221, 2024.
L. Wu, “Social Graph Embedding for Academic Prediction,” in Proceedings of the Educational Data Mining Conference (EDM), 2025.
J. Kang, R. Barmaki, and B. Smith, “Predicting success in collaborative science simulations using gaze synchronization,” in Proceedings of the ACM CHI Conference on Human Factors in Computing Systems, 2024.
A. Acosta, S. D’Mello, and A. Duckworth, “Predicting collaboration satisfaction in educational games using multimodal behavior modeling,” IEEE Transactions on Affective Computing, Early Access, 2024.
A. Tomić, D. Milinković, and M. Matijević, “AI-supported assessment of collaboration quality in education,” Computers in Human Behavior, vol. 130, p. 107203, 2022.
V. P. Dwivedi, X. Bresson, and Y. Bengio, “A Generalization of Transformer Networks to Graphs,” in NeurIPS Graph Learning Workshop, 2021.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention Is All You Need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
W. Hu, M. Fey, M. Zitnik, et al., “Open Graph Benchmark: Datasets for Machine Learning on Graphs,” in NeurIPS, vol. 33, 2020.
T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations (ICLR), 2017.
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021.
P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph Attention Networks,” in International Conference on Learning Representations (ICLR), 2018.
G. Li, M. Müller, A. Thabet, and B. Ghanem, “DeepGCNs: Can GCNs go as deep as CNNs?,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2020.
C. Busso, M. Bulut, C. C. Lee, A. Kazemzadeh, E. Mower, and S. Narayanan, “IEMOCAP: Interactive emotional dyadic motion capture database,” in Language Resources and Evaluation Conference (LREC), 2008.
S. D’Mello, E. Dieterle, and A. Duckworth, “Advanced learning analytics for measuring student engagement,” Journal of Learning Analytics, vol. 4, no. 1, pp. 49–71, 2017.
J. Zhang, C. Wang, and M. Zhang, “Multimodal engagement prediction using transformers in collaborative learning,” IEEE Transactions on Multimedia, vol. 25, no. 3, pp. 855–867, 2023.
J. Park, D. Lee, and J. Choi, “Attention-based social graph learning for behavioral modeling,” in AAAI Conference on Artificial Intelligence, vol. 38, no. 1, 2024.
M. Chen, Z. Wei, and Y. Duan, “Multimodal Graph Transformer for Social Interaction Understanding,” in CVPR Workshops, 2022.
S. Ghosh, R. Panda, and A. Roy-Chowdhury, “Learning social attention for multi-party conversation modeling,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
J. Kim, J. Zhang, and C. Xu, “Unified Graph and Multimodal Transformer for Conversational Engagement,” IEEE Transactions on Multimedia, vol. 26, no. 1, pp. 112–124, 2024.
A. Ekuban, M. Yee-King, and M. d’Inverno, “Predicting team-based student performance from Git-based interaction data,” in Proceedings of the International Conference
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Pirimqulova Zilola Avaz qizi, Hojiyev Sunatullo Nasridin o‘g‘li, Xo‘jamqulov Abdulaziz Xazrat o‘g‘li

This work is licensed under a Creative Commons Attribution 4.0 International License.







