Quality-Assured Predictive Threat Intelligence in Cloud–Lakehouse Systems: Bayesian Statistical Learning and AI-Augmented Risk Monitoring for Data-Limited Environments

Authors

  • Daniel Javier González Torres Software Developer, Spain Author

DOI:

https://doi.org/10.15662/IJARCST.2025.0806017

Keywords:

Quality assurance, Bayesian statistical learning, threat intelligence, cloud–lakehouse architecture, AI-augmented risk monitoring, uncertainty quantification, anomaly detection, data-limited environments, model calibration, drift detection, cyber risk analytics

Abstract

Ensuring reliable and trustworthy cyber-threat detection in modern cloud–lakehouse infrastructures is increasingly challenging due to sparse labeling, heterogeneous telemetry, and rapidly shifting behavioral patterns. This work introduces Quality-Assured Predictive Threat Intelligence, a Bayesian–AI hybrid framework designed to deliver calibrated, explainable, and continuously validated risk monitoring in data-limited environments. The framework integrates hierarchical Bayesian statistical learning with foundation-model–based feature enrichment to infer threat probabilities under uncertainty, while generative imputations and contextual embeddings strengthen signal fidelity when data is incomplete or noisy. A built-in quality assurance (QA) layer provides systematic validation of data quality, model stability, drift resilience, and risk-score calibration through probabilistic diagnostics, posterior predictive checks, and automated performance gates. Streaming lakehouse pipelines support real-time inference, enabling early identification of cloud-native threats such as identity misuse, lateral movement, and anomalous service interactions. Empirical evaluations across hybrid synthetic–real telemetry show that the QA-enhanced Bayesian–AI approach improves reliability, reduces false positives, and maintains predictive robustness under distributional drift and label scarcity. This framework offers a transparent, auditable, and scalable pathway for operationalizing high-assurance threat intelligence within cloud–lakehouse ecosystems

References

1. Dwork, C. (2006). Differential privacy. Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP), 1–12.

2. Tamizharasi, S., Rubini, P., Saravana Kumar, S., & Arockiam, D. Adapting federated learning-based AI models to dynamic cyberthreats in pervasive IoT environments.

3. Vinay, T. M., Sunil, M., & Anand, L. (2024, April). IoTRACK: An IoT based'Real-Time'Orbiting Satellite Tracking System. In 2024 2nd International Conference on Networking and Communications (ICNWC) (pp. 1-6). IEEE.

4. Sugumar, R. (2023, September). A Novel Approach to Diabetes Risk Assessment Using Advanced Deep Neural Networks and LSTM Networks. In 2023 International Conference on Network, Multimedia and Information Technology (NMITCON) (pp. 1-7). IEEE.

5. Adari, V. K. (2024). How Cloud Computing is Facilitating Interoperability in Banking and Finance. International Journal of Research Publications in Engineering, Technology and Management (IJRPETM), 7(6), 11465-11471.

6. Suchitra, R. (2023). Cloud-Native AI model for real-time project risk prediction using transaction analysis and caching strategies. International Journal of Research Publications in Engineering, Technology and Management (IJRPETM), 6(1), 8006–8013. https://doi.org/10.15662/IJRPETM.2023.0601002

7. Kotapati, V. B. R., Perumalsamy, J., & Yakkanti, B. (2022). Risk-Adapted Investment Strategies using Quantum-enhanced Machine Learning Models. American Journal of Autonomous Systems and Robotics Engineering, 2, 279-312.

8. Konda, S. K. (2024). AI Integration in Building Data Platforms: Enabling Proactive Fault Detection and Energy Conservation. International Journal of Advanced Research in Computer Science & Technology (IJARCST), 7(3), 10327-10338.

9. Muthusamy, M. (2024). Cloud-Native AI metrics model for real-time banking project monitoring with integrated safety and SAP quality assurance. International Journal of Research and Applied Innovations (IJRAI), 7(1), 10135–10144. https://doi.org/10.15662/IJRAI.2024.0701005

10. Karanjkar, R., & Karanjkar, D. Quality Assurance as a Business Driver: A Multi-Industry Analysis of Implementation Benefits Across the Software Development Life Cycle. International Journal of Computer Applications, 975, 8887.

11. Kusumba, S. (2025). Modernizing Healthcare Finance: An Integrated Budget Analytics Data Warehouse for Transparency and Performance. Journal of Computer Science and Technology Studies, 7(7), 567-573.

12. Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. Proceedings of the International Conference on Learning Representations (ICLR).

13. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008.

14. Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774.

15. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?": Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144.

16. Kandula, N. Evolution and Impact of Data Warehousing in Modern Business and Decision Support Systems

17. Papernot, N., Abadi, M., Erlingsson, Ú., Goodfellow, I., & Talwar, K. (2017). Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755.

18. Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153–163.

19. Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., & Venkatasubramanian, S. (2015). Certifying and removing disparate impact. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 259–268.

20. Kumar, S. N. P. (2025). Regulating Autonomous AI Agents: Prospects, Hazards, and Policy Structures. Journal of Computer Science and Technology Studies, 7(10), 393-399.

21. Suresh, H., & Guttag, J. V. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAT)*, 1–12.

22. Muthusamy, P., Thangavelu, K., & Bairi, A. R. (2023). AI-Powered Fraud Detection in Financial Services: A Scalable Cloud-Based Approach. Newark Journal of Human-Centric AI and Robotics Interaction, 3, 146-181.

23. Konatham, M. R., Uddandarao, D. P., Vadlamani, R. K., & Konatham, S. K. R. (2025, July). Federated Learning for Credit Risk Assessment in Distributed Financial Systems using BayesShield with Homomorphic Encryption. In 2025 International Conference on Computing Technologies & Data Communication (ICCTDC) (pp. 1-6). IEEE.

24. Nagarajan, G. (2022). An integrated cloud and network-aware AI architecture for optimizing project prioritization in healthcare strategic portfolios. International Journal of Research and Applied Innovations, 5(1), 6444–6450. https://doi.org/10.15662/IJRAI.2022.0501004

25. HV, M. S., & Kumar, S. S. (2024). Fusion Based Depression Detection through Artificial Intelligence using Electroencephalogram (EEG). Fusion: Practice & Applications, 14(2).

26. A. K. S, L. Anand and A. Kannur, "A Novel Approach to Feature Extraction in MI - Based BCI Systems," 2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, 2024, pp. 1-6, doi: 10.1109/CSITSS64042.2024.10816913.

27. Kumar, R. K. (2022). AI-driven secure cloud workspaces for strengthening coordination and safety compliance in distributed project teams. International Journal of Research and Applied Innovations (IJRAI), 5(6), 8075–8084. https://doi.org/10.15662/IJRAI.2022.0506017

28. Adari, V. K., Chunduru, V. K., Gonepally, S., Amuda, K. K., & Kumbum, P. K. (2024). Artificial Neural Network in Fibre-Reinforced Polymer Composites using ARAS method. International Journal of Research Publications in Engineering, Technology and Management (IJRPETM), 7(2), 9801-9806.

29. OpenLineage Community & Data Engineering Reports (2023). Open metadata standards for data lineage and observability in modern data stacks. Industry Whitepaper.

Downloads

Published

2025-11-26

How to Cite

Quality-Assured Predictive Threat Intelligence in Cloud–Lakehouse Systems: Bayesian Statistical Learning and AI-Augmented Risk Monitoring for Data-Limited Environments. (2025). International Journal of Advanced Research in Computer Science & Technology(IJARCST), 8(6), 13200-13207. https://doi.org/10.15662/IJARCST.2025.0806017