An Efficient Machine Learning Framework for Behavioral Insider Threat Detection with Comparative Analysis of Ensemble Methods.

J Sivarani; R Bhargava Reddy; C Sai Sreeja; R Keerthi Priya; S Munendra

doi:10.5281/zenodo.18836210

Authors

J Sivarani Department of CSE (IoT and Cyber Security including Block Chain Technology), Annamacharya Institute of Technology & Sciences (Autonomous), Tirupati, A.P, India.
R Bhargava Reddy Department of CSE (IoT and Cyber Security including Block Chain Technology), Annamacharya Institute of Technology & Sciences (Autonomous), Tirupati, A.P, India.
C Sai Sreeja Department of CSE (IoT and Cyber Security including Block Chain Technology), Annamacharya Institute of Technology & Sciences (Autonomous), Tirupati, A.P, India.
R Keerthi Priya Department of CSE (IoT and Cyber Security including Block Chain Technology), Annamacharya Institute of Technology & Sciences (Autonomous), Tirupati, A.P, India.
S Munendra Department of CSE (IoT and Cyber Security including Block Chain Technology), Annamacharya Institute of Technology & Sciences (Autonomous), Tirupati, A.P, India.

DOI:

https://doi.org/10.5281/zenodo.18836210

Keywords:

Insider threats, machine learning, Decision Tree, Random Forest, XGBoost, threat detection

Abstract

The insider threats, where insiders with access and authorization misuse it, pose a major challenge to the current prevalent cybersecurity systems. With respect to the insider threats, it can be noted that they have access and authorization, and hence, detection is quite complex and challenging, unlike other attacks by external users. The current analysis focuses on the detection of these insider threats using quite effective and efficient machine learning classifiers, particularly the Decision Tree, Random Forest, and XGBoost classifiers. These classifiers have been chosen because they can handle large volumes of data and can easily cater to the detection of the various insider threats by identifying the pattern or anomaly with respect to the user access. The decision tree is quite efficient and can be easily interpreted, and hence used, whereas the random forest classifier combines the results and predictions made by each and every decision tree, thus giving higher accuracy. The XGBoost classifier, because of its speed and higher accuracy, can easily handle higher volumes of data and provide efficient results, thus becoming quite scalable and useful for resolving the issues posed by insider threats. Experimental results clearly indicate XGBoost is better suited for accuracy and yields a result of 98% for complicated scenarios related to threats, while Random Forest and Decision Tree help yield better results and save resources.

References

Chen, Y., et al. (2025). XGBoost for insider data exfiltration detection in large systems. Journal of Cyber Intelligence, 14(2), 67–80.

Gao, P., Zhang, H., Wang, M., Yang, W., Wei, X., Lv, Z., & Ma, Z. (2025). Deep temporal graph infomax for imbalanced insider threat detection. Journal of Computer Information Systems, 65(1), 108–118.

Gupta, P., et al. (2025). Decision tree-based hybrid model for anomalous behavior detection. Cybersecurity Review, 13(4), 78–89.

Gupta, R., et al. (2025). Random forest for imbalanced insider threat detection. Security and Privacy Journal, 10(1), 55–70.

Johnson, T., et al. (2025). Real-time insider threat detection with XGBoost. International Journal of Security and Applications, 20(2), 132–144.

Kotb, H. M., Gaber, T., AlJanah, S., Zawbaa, H. M., & Alkhathami, M. (2025). A novel deep synthesis-based insider intrusion detection (DS-IID) model for malicious insiders and AI-generated threats. Scientific Reports, 15(1), 207.

Kumar, A., et al. (2025). Enhancing insider threat detection with hybrid random forest and XGBoost models. Computational Intelligence and Security, 32(3), 112–125.

Lavanya, P., Glory, H. A., & Sriram, V. S. S. (2024). Mitigating insider threat: A neural network approach for enhanced security. IEEE Access, 12, 73752–73768.

Lee, D., et al. (2025). Efficient threat detection with XGBoost for enterprise systems. Journal of Data Science and Technology, 30(5), 97–110.

Li, C., Zhu, Z., He, J., & Zhang, X. (2025). RedChronos: A large language model-based log analysis system for insider threat detection in enterprises. arXiv. https://arxiv.org/abs/2503.02702

Manoharan, P., Yin, J., Wang, H., Zhang, Y., & Ye, W. (2024). Insider threat detection using supervised machine learning algorithms. Telecommunication Systems, 87(4), 899–915.

Nikiforova, O., Romanovs, A., Zabiniako, V., & Kornienko, J. (2024). Detecting and identifying insider threats based on advanced clustering methods. IEEE Access, 12, 30242–30253.

Patel, V., et al. (2025). Real-time insider threat detection with XGBoost: An enterprise security approach. Cybersecurity Engineering Review, 28(4), 191–205.

Pennada, S. S. P., & Nayak, S. K. (2025). Insider threat detection using behavioural analysis through machine learning and deep learning techniques. International Research Journal of Multidisciplinary Technovation, 7(2), 74–86.

Prasad, P. S. S., Nayak, S. K., & Krishna, M. V. (2024). Enhanced insider threat detection through machine learning approach with imbalanced data resolution. Journal of Theoretical and Applied Information Technology, 102(3).*

Smith, J., et al. (2025). Random forest for insider threat detection in financial systems. Journal of Cybersecurity, 25(1), 45–59.

Song, C., Ma, L., Zheng, J., Liao, J., Kuang, H., & Yang, L. (2024). Audit-LLM: Multi-agent collaboration for log-based insider threat detection. arXiv. https://arxiv.org/abs/2408.08902

Wang, Z., et al. (2025). Decision tree-based anomaly detection for insider threats in low-resource systems. International Journal of Cybersecurity, 18(6), 203–217.

Wang, Z. Q., & El Saddik, A. (2023). DTITD: An intelligent insider threat detection framework based on digital twin and self-attention based deep learning models. IEEE Access, 11, 114013–114030.

Zhang, L., et al. (2025). Hybrid XGBoost and deep learning for insider threat detection. Journal of Machine Learning and Security, 22(7), 154–168.

Kumar, A., Satheesha, T. Y., Salvador, B. B. L., Mithileysh, S., & Ahmed, S. T. (2023). Augmented Intelligence enabled Deep Neural Networking (AuDNN) framework for skin cancer classification and prediction using multi-dimensional datasets on industrial IoT standards. Microprocessors and Microsystems, 97, 104755.

Ahmed, S. T., Sreedhar Kumar, S., Anusha, B., Bhumika, P., Gunashree, M., & Ishwarya, B. (2020). A generalized study on data mining and clustering algorithms. In New Trends in Computational Vision and Bio-inspired Computing: Selected works presented at the ICCVBIC 2018, Coimbatore, India (pp. 1121-1129). Cham: Springer International Publishing.

Singh, K. D., & Ahmed, S. T. (2020, July). Systematic linear word string recognition and evaluation technique. In 2020 international conference on communication and signal processing (ICCSP) (pp. 0545-0548). IEEE.

Ahmed, S. T., Venkatesan, V. K., & Venkatesan, M. (2024). Augmented intelligence based covid-19 diagnostics and deep feature categorization based on federated learning. IEEE Transactions on Emerging Topics in Computational Intelligence, 8(5), 3308-3315.