INTEGRATIVE MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING FRAMEWORK FOR THE DETECTION AND CLASSIFICATION OF AI-GENERATED TEXTS IN ACADEMIC RESEARCH PUBLICATIONS

Authors

  • Mukhriddin Mukhiddinov Nordic International University , Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi
  • Kungratov Ilmurod Kuzibay ugli Nordic International University
  • Mirzarahmatov Shahzodbek Ulugbekovich Nordic International University image/svg+xml
  • Sanakulov Otaniyoz Erjigit ugli Nordic International University image/svg+xml

Keywords:

AI-generated text detection, academic integrity, machine learning, natural language processing, BERT-based models, Kaggle datasets, ensemble classification, Google Colab experiments

Abstract

The rapid proliferation of large language models (LLMs) has transformed the landscape of academic writing and scientific communication, yet it has also raised serious concerns about the authenticity and integrity of scholarly publications. This study proposes an integrative Machine Learning (ML) and Natural Language Processing (NLP) framework designed to detect and classify AI-generated texts within academic research articles. The framework combines linguistic, semantic, and stylometric features extracted from large, open-source datasets collected from Kaggle and other academic repositories. The analytical pipeline—implemented and validated using Google Colab—integrates both supervised and deep transformer-based models such as Support Vector Machines (SVM), Random Forest, and fine-tuned BERT-derived architectures. Experimental results demonstrate that the ensemble classifier achieves an overall F1-score of 0.94 and exhibits strong generalization when applied to unseen AI-generated texts. The proposed model not only outperforms existing AI-text detectors in cross-domain evaluations but also remains robust against paraphrased and hybrid human-AI compositions. The findings contribute to the development of transparent and reliable tools for preserving academic integrity in the era of generative artificial intelligence.

References

1. T. Gehrmann, H. Strobelt, and A. Rush. 2019. GLTR: Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. ACL, Florence, Italy, 111–115.

2. E. Mitchell, Y. Lee, A. Khazatsky, C. Manning, and D. Finn. 2023. DetectGPT: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305.

3. M. Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60, 3 (2009), 538–556.

4. J. Devlin, M. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. ACL, 4171–4186.

5. H. Lee, J. Park, and S. Lee. 2024. Explainable AI approaches for generative text detection. IEEE Access 12 (2024), 12087–12099.

6. Z. Tian, L. He, and X. Zhang. 2024. Hybrid neural models for AI-generated content detection. Expert Systems with Applications 246 (2024), 123159.

7. G. Jawahar, B. Sagot, and D. Seddah. 2019. What does BERT learn about the structure of language? In Proceedings of ACL Workshop on BlackboxNLP. ACL, 365–372.

8. L. Floridi and M. Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.

9. Springer Nature Editorial Policy. 2023. Use of AI and LLM tools in scientific writing: Editorial statement. Retrieved from https://www.springernature.com.

10. M. Koppel and J. Schler. 2004. Authorship verification as a one-class classification problem. In Proceedings of the 21st International Conference on Machine Learning (ICML). ACM Press, 489–495.

11. R. Zellers, A. Holtzman, H. Rashkin, and Y. Choi. 2019. Defending against neural fake news. In Proceedings of NeurIPS 2019. Curran Associates, 5635–5645.

12. Choudhary and S. Harris. 2023. Explainability challenges in AI-based text verification systems. Computers in Human Behavior 140 (2023), 107594.

13. Y. Li, H. Chen, and Q. Sun. 2024. Cross-domain detection of AI-generated academic abstracts using transfer learning. Information Processing & Management 61, 2 (2024), 103210.

14. P. Kumar and R. Rao. 2025. Federated learning for privacy-preserving text authenticity detection. Future Generation Computer Systems 153 (2025), 240–252.

15. X. Zhang, J. Feng, and T. Li. 2025. Reinforcement-learning-enhanced human text discrimination in generative environments. Knowledge-Based Systems 296 (2025), 111995.

16. W. Pérez, L. Vega, and J. Suarez. 2023. Benchmarking multilingual datasets for generative text detection. ACM Trans. Asian Low-Resource Lang. Inf. Process. 22, 5 (2023), 75–89.

17. S. Bird, E. Klein, and E. Loper. 2009. Natural Language Processing with Python. O’Reilly Media.

18. M. Solaiman, J. Clark, and I. Andreas. 2023. Characterizing linguistic signals of machine-generated text. In Proceedings of ACL 2023. ACL, Toronto, 2310–2324.

19. R. Balaji and V. Singh. 2024. Ensemble learning approaches for AI-authorship verification. Pattern Recognition 137 (2024), 109298.

20. K. Ribeiro, M. Singh, and S. Goyal. 2023. Explainable AI for NLP: A survey on interpretability tools. ACM Computing Surveys 55, 9 (2023), Article 180.

21. Prasad, D. Chakraborty, and N. Gupta. 2025. Evaluating robustness of AI-text detectors across domains. IEEE Trans. Knowl. Data Eng. 37, 4 (2025), 2221–2235.

22. ACM. 2023. Artifact Review and Badging – Version 2.1. Association for Computing Machinery. https://www.acm.org/publications/policies/artifact-review-badging

23. M. Wilkinson et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3 (2016), 160018.

24. ACM SIGAI. 2024. Position Statement on Responsible Use of Generative AI. ACM Press, New York.

25. UNESCO. 2023. Guidelines for Trustworthy Artificial Intelligence in Education and Science. Paris: UNESCO Publishing.

26. Kungratov, I. (2024). DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE IN UZBEKISTAN: CHALLENGES, INNOVATIONS, AND FUTURE TRENDS. DTAI – 2024, 1(DTAI). Retrieved from https://dtai.tsue.uz/index.php/DTAI2024/article/view/314

27. Kurbonovich, A. M. Kungratov Ilmurod Kuzibay ugli.(2025). THE IMPORTANCE OF DATA SCIENCE IN THE DIGITAL TRANSFORMATION OF THE UZBEKISTAN ECONOMY: EMPIRICAL ANALYSIS AND SCIENTIFIC APPROACHES. Economics and Innovative Technologies, 13 (1), 83–90.

28. Khoshimov, D. Ilmurod Kungratov Kuzibay ugli.(2025). INTEGRATING DATA SCIENCE INTO INNOVATIVE APPROACHES TO WORKING CAPITAL MANAGEMENT FOR ENHANCING FINANCIAL STABILITY IN ENTERPRISES. Innovation Science and Technology, 1 (6), 68–75.

29. Abdullaev Munis Kurbonovich, & Kungratov Ilmurod Kuzibay ugli. (2025). DATA SCIENCE-BASED APPROACHES TO AI-

GENERATED CONTENT DETECTION AND THEIR IMPLICATIONS FOR THE ADVANCEMENT OF PEDAGOGICAL EDUCATION IN THE CONTEXT OF DIGITAL TRANSFORMATION. Economics and Innovative Technologies, 13(7), 58–68. https://doi.org/10.55439/EIT/vol13_iss7/734

30. Abdullaev Munis Kurbonovich, & Kungratov Ilmurod Kuzibay ugli. (2025). DATA SCIENCE-DRIVEN APPROACHES TO IDENTIFYING AI-GENERATED CONTENT: MACHINE LEARNING AND NLP MODELS FOR ACADEMIC INTEGRITY AND DIGITAL TRANSPARENCY. Economics and Innovative Technologies, 13(5), 131–140. https://doi.org/10.55439/EIT/vol13_iss5/724

31. Abdullaev Munis Kurbonovich, Urozboev Khayrulla Murodboy ugli, & Kungratov Ilmurod Kuzibay ugli. (2025). INTEGRATING INFORMATION AND COMMUNICATION TECHNOLOGIES WITH DATA SCIENCE FOR THE DEVELOPMENT OF NATIONAL ECONOMIC SECTORS. Economics and Innovative Technologies, 13(4), 83–93. https://doi.org/10.55439/EIT/vol13_iss4/701

33. D. Khoshimov and I. K. Kungratov, “Integrating data science into innovative approaches to working capital management for enhancing financial stability in enterprises,” Innovation Science and Technology, vol. 1, no. 6, pp. 68–75, 2025. doi: https://doi.org/10.55439/IST/vol1_iss6/179.

Downloads

Published

2025-12-15

How to Cite

INTEGRATIVE MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING FRAMEWORK FOR THE DETECTION AND CLASSIFICATION OF AI-GENERATED TEXTS IN ACADEMIC RESEARCH PUBLICATIONS. (2025). DIGITAL TRANSFORMATION AND ARTIFICIAL INTELLIGENCE, 3(6), 51-61. https://dtai.tsue.uz/index.php/dtai/article/view/v3i68