AI-Driven NLP Framework for Intelligent Cyber Threat Detection and Textual Threat Analysis

Gowtham Reddy Kunduru

doi:10.5281/zenodo.18243040

Vol. 1 No. 1 (2026): January 2026

Artificial Intelligence : Technology

AI-Driven NLP Framework for Intelligent Cyber Threat Detection and Textual Threat Analysis

DOWNLOAD PDF

Gowtham Reddy Kunduru

more info

Gowtham Reddy Kunduru
Lead Software Engineer, M&T Bank Buffalo, New York, United States of America.

DOI: https://doi.org/10.5281/zenodo.18243040

Published 2026-01-31

Keywords

Cybersecurity,
Cyber Threat Detection,
Natural Language Processing,
CyberBERT-LSTM,
Zero-Day Attacks

How to Cite

Gowtham Reddy Kunduru. (2026). AI-Driven NLP Framework for Intelligent Cyber Threat Detection and Textual Threat Analysis. Milestone Transactions on Artificial Intelligence, 1(1), 1–17. https://doi.org/10.5281/zenodo.18243040

Abstract

With the rising complexity and volume of cyber threats, there is an urgent need for intelligent, adaptive solutions to effectively analyze unstructured textual data. Traditional signature and rule-based detection mechanisms fail to detect zero-day attacks and evolving threat patterns, especially when these threats are hidden in textual sources such as cyber threat intelligence reports, malware descriptions, vulnerability disclosures, and social media updates. This paper, therefore, proposes an artificial intelligence-based Natural Language Processing (NLP) solution for intelligent cyber threat detection and textual threat analysis. The solution proposes a hybrid CyberBERT-LSTM (Cybersecurity Bidirectional Encoder Representations from Transformers – Long Short-Term Memory) model that integrates transformation-based context features with sequential modeling to effectively capture semantic context and sequential relationships in cyber threat stories. This study evaluates the proposed CyberBERT-LSTM model in a rigorous comparison with other conventional machine learning models, including Logistic Regression, Support Vector Machines, LSTMs, and BERT. This study shows a consistent superiority of the proposed CyberBERT-LSTM model over all competitors, with accuracy = 0.98, precision = 0.97, recall rate = 0.99, and F1 measure = 0.98. Other tests performed to evaluate this proposed study include ROC AUC, precision, recall, and an F1-score of 0.98. Additional analyses using ROC-AUC, precision–recall curves, threshold sensitivity, and ablation studies further validate the robustness, reliability, and scalability of the proposed framework. The results underscore the need to integrate contextual intelligence with sequential models for effective cyber threat detection in texts, thereby firmly setting the stage for the proposed framework to serve as a viable solution to real-world challenges in cyber threat intelligence and security.

DOWNLOAD PDF

References

Brown, A., Williams, P., & Carter, J. (2025). SecureBERT 2.0: Advancing domain-specific language models for cybersecurity intelligence. arXiv. https://arxiv.org/abs/2510.00240
Tellache, M., Korba, A., Mokhtari, S., Moldovan, D., & Ghamri-Doudane, A. (2025). CYLENS: Leveraging large language models for cyber threat intelligence and incident response. arXiv. https://arxiv.org/abs/2502.20791
Sarker, S., Rahman, M., & Das, A. K. (2025). Natural language processing techniques for cyber threat intelligence: A comprehensive survey. Computer Science Review, 49, 100–128.
Sorokoletova, K., Antonioni, D., & Colò, M. (2025). 0-CTI: Zero-shot cyber threat intelligence extraction using transformer-based models. arXiv. https://arxiv.org/abs/2501.06239
Khan, T., Malik, M., Aziz, Z., Abid, M. K., & Sabir, M. (2025). A comparative study of machine learning and deep learning models for cyber threat text classification. Journal of Cybersecurity and Information Systems, 9(2), 45–60.
Rahman, M., Islam, S., & Ahmed, N. (2025). Threat intelligence automation using NLP and machine learning. International Journal of Computer Research and Technology, 13(4), 210–219.
Minaee, N., Azimi, E., & Wang, Y. (2025). Transformers and large language models for intrusion detection systems: A survey. Information Fusion, 102, 102–121.
Rahman, M., Hasan, R., & Al Mamun, A. (2025). Automated cyber threat intelligence extraction using NLP-based machine learning models. International Journal of Computer Trends and Technology, 73(1), 1–10.
Sorokoletova, K., Antonioni, D., & Colò, M. (2025). Zero-shot learning for cyber threat intelligence knowledge extraction. arXiv. https://arxiv.org/abs/2501.06239
Tellache, M., Korba, A., Mokhtari, S., Moldovan, D., & Ghamri-Doudane, A. (2025). Retrieval-augmented generation for autonomous cyber incident response. arXiv. https://arxiv.org/abs/2508.10677
Al-Yasiri, A., Hossain, M., & Alam, R. (2025). A conceptual NLP-based framework for multilingual cyber threat intelligence processing. arXiv. https://arxiv.org/abs/2506.03551
Minaee, N., Abdolrashidi, A., & Khoshgoftaar, S. (2025). Attention-based models and large language models for cybersecurity applications: A survey. Computer Science Review, 50, 1–24.
Sarker, S., Colman, A., & Hossain, M. I. (2025). NLP-driven cybersecurity: Techniques, applications, and challenges. Journal of Emerging Technologies and Network Research, 6(3), 55–70.
Wang, Y., Zhang, L., & Chen, X. (2025). Optimized cyber threat classification using hybrid LLM and machine learning models. Scientific Reports, 15(1), 1–14.
Ferrag, M. A., Maglaras, L., & Janicke, H. (2025). Opportunities and risks of large language models in cybersecurity. AI, 6(9), 216–235.
Memon, N. A., Ali, A., Longa, F. E. A., & Awan, D. (2025). Natural language processing techniques for cybersecurity threat analysis in multilingual environments. Security and Emerging Systems Journal, 4(2), 88–101.
Khan, T., Malik, M., Aziz, Z., Abid, M. K., & Sabir, M. (2025). Evaluation of classical and deep learning models for text-based cyber threat detection. Journal of Cybersecurity and Information Science, 10(1), 25–39.
Lohare, S. T., Maaz, M., Razi, M., Nehal, M., & Ahmed, S. T. (2025, February). Road Rage Detection System Using Deep Learning and Computer Vision. In 2025 3rd International Conference on Integrated Circuits and Communication Systems (ICICACS) (pp. 1-8). IEEE.
Ahmed, S. T., Fathima, A. S., & Reema, S. (2023, December). An Improved System for Students Feedback Analysis Using Supervised Probability Techniques. In 2023 10th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) (Vol. 10, pp. 328-333). IEEE.
Seetharaman, S. K., & Syed, T. A. (2025). An Automated Medical Diagnosis System for Neoplasm Medical (MRI) Image Classification using Supervised and Unsupervised Techniques.

AI-Driven NLP Framework for Intelligent Cyber Threat Detection and Textual Threat Analysis

Keywords

How to Cite

Download Citation

Abstract

References