Assessing the Impact of Data Compression Algorithms on Database Performance and Scalability in Cloud Computing Platforms

Authors

  • Chandan Hegde, Gowda Nisarga Chandrashekhar, Hema S Dept. of MCA, Surana College (Autonomous) Bengaluru, Karnataka, India Author

DOI:

https://doi.org/10.15662/IJARCST.2025.0805015

Keywords:

Cloud Base Database, Machine Learning, Scalability, Query Performance, Storage Optimization

Abstract

Effective management of cloud-based databases involves careful planning for storage usage, query performance, and scalability. Data compression is an important method used in reducing storage expenses, but algorithm effectiveness hinges on parameters including dataset size, workload patterns, and available capabilities. This research examines the performance of various compression algorithms Gzip, LZ4, Zstandard (Zstd), Snappy, Bzip2, and LZMA on datasets of varying sizes (small, medium, and large). The analysis took into account parameters such as compression ratio, compression and decompression time, CPU and memory usage, throughput, query latency, and storage savings. For enhanced decision-making, a machine learning methodology was created and learned from experimental data. Random Forest, XGBoost, LightGBM, CatBoost, and ensemble methods (Voting, Stacking) were used. The system generates the most appropriate algorithm for a specified dataset, facilitating intelligent and adaptive compression decisions. Model interpretability was facilitated by ROC curves, confusion matrices, and SHAP-based feature analysis. Integration with PostgreSQL confirmed query performance under compressed data conditions, confirming practical use. The findings show that adaptive, ML-based algorithm selection for compression enhances efficiency, query performance, and scalability over static techniques. This work highlights the promise of integrating compression and machine learning to enable smart and resource conscious cloud data management.

References

[1] P. DAS, S. GHOSH, AND A. PRADHAN, "LOSSLESS DATA COMPRESSION FOR CLOUD STORAGE SYSTEMS: A REVIEW," INT. J. COMPUT. SCI. ENG., VOL. 7, NO. 5, 2019.HTTPS://WWW.IJCSEONLINE.ORG/FULL_PAPER_VIEW.PHP?PAPER_ID=3784

[2] L. TAN, H. ZHAO, AND Y. LIU, "COMPARATIVE STUDY OF LOSSLESS COMPRESSION ALGORITHMS FOR LARGE-SCALE [DATASETS," J. SUPERCOMPUT., VOL. 75, PP. 1210–1232, 2019. HTTPS://LINK.SPRINGER.COM/ARTICLE/10.1007/S11227-018-2422-6

[3] P. Gupta and A. Raj, "Hybrid Compression Techniques for Optimized Cloud Storage," Int. J. Recent Technol.Eng.,vol.8, no. 2, 2019. https://www.ijrte.org/wp-content/uploads/papers/v8i2S2/B11170782S219.pdf

[4] D. Kim, S. Lee, and C. Kang, "Efficient Cloud Storage Using Hybrid Compression Techniques," Future Gener. Comput. Syst., vol. 115, pp. 347–362, 2021. https://doi.org/10.1016/j.future.2020.08.034

[5] S. Li, X. Chen, and Q. Xu, "Hybrid Data Compression for Cloud Big Data Analytics," Future Gener. Comput. Syst., vol. 111, pp. 503–518, 2020. https://www.sciencedirect.com/science/article/pii/S0167739X20301353

[6] S. Kumar, P. Agrawal, and N. Mishra, "Enhancing Query Performance through Data Compression in Column-Oriented Cloud Databases," J. Inf. Technol. Softw. Eng., vol. 11, no. 3, 2021. https://www.longdom.org/abstract/enhancing-query-performance-through-data-compression-in-columnoriented-cloud-databases-70879.html

[7] M. Prasad, R. Singh, and P. Kumar, "Efficient Storage and Retrieval of Big Data Using Hybrid Compres-sion in Cloud," Int. J. Eng. Res. Technol., vol. 10, no. 7, 2021. https://www.ijert.org/research/efficient-storage-and-retrieval-of-big-data-using-hybrid-compression-in-cloud-IJERTV10IS070157.pdf

[8] N. Reddy, M. Srinivas, "Compression-Based Optimization of Cloud Database Performance," Int. J. Adv. Res. Comput. Sci., vol. 11, no. 6, 2020. https://ijarcs.info/index.php/Ijarcs/article/view/6543

[9] D. Bose, A. Saha, K. Ghosh, "Performance Analysis of Compression Techniques for Cloud Data Storage," Int. J. Comput. Appl., vol. 178, no. 7, 2019. https://doi.org/10.5120/ijca2019918842

[10] T. Nguyen and D. Hoang, "Optimizing Big Data Storage with Adaptive Compression," IEEE Access, vol. 7, pp. 114512–114525, 2019. https://doi.org/10.1109/ACCESS.2019.293984

[11] R. Huang, Y. Wang, and Z. Li, "Performance Analysis of Cloud Database Compression Methods," J. Syst. Softw., vol. 167, 110612, 2020. https://doi.org/10.1016/j.jss.2020.110612

[12] N. Wichmann and F. Richter, "Efficient Query Processing in Column-Oriented Databases Using Adap-tive Compression," J. Database Manag., vol. 29, no. 2, pp. 1–21, 2018. https://www.igi-global.com/article/efficient-query-processing-in-column-oriented-databases-using-adaptive-compression/204463

[13] H. Zhang, X. Ma, and J. Wang, "Performance Evaluation of Compression Algorithms in Distributed Cloud Storage," J. Parallel Distrib. Comput., vol. 127, pp. 120–134, 2019. https://www.sciencedirect.com/science/article/pii/S0743731519300320

[14] Y. Kwon, J. Park, and H. Choi, "Adaptive Compression Techniques for Cloud Storage Systems," IEEE Trans. Cloud Comput., vol. 9, no. 2, pp. 657–670, 2021. https://ieeexplore.ieee.org/document/9156130

[15] J. Smith, R. Patel, and L. Chen, "Machine Learning Approaches for Cloud Data Compression," ACM Trans. Knowl. Discov. Data, vol. 12, no. 3, pp. 1–27, 2018. https://doi.org/10.1145/3219876

[16] X. Zhang, Y. Li, and J. Wang, "OnlineTune: ML-Driven Configuration Tuning for Cloud Databases," ACM Trans. Database Syst., vol. 44, no. 3, pp. 1–28, 2019. https://dl.acm.org/doi/10.1145/3313793

[17] E. Ozyilkan and E. Erkip, "Machine Learning-Based Distributed Compression Techniques," IEEE Trans. Commun., vol. 68, no. 9, pp. 5678–5691, 2020. https://ieeexplore.ieee.org/document/8990454

[18] Y. Luo, W. Sun, and Q. Zhao, "Machine Learning-Based Compression Optimization in Big Data Plat-forms," Inf. Sci., vol. 532, pp. 356–372, 2020. https://doi.org/10.1016/j.ins.2020.05.045

[19] Y. Zuo, H. Li, and J. Chen, "Reinforcement Learning-Based Compression for Time-Series Data with Random Access," J. Cloud Comput., vol. 10, no. 1, p. 45, 2021. https://journalofcloudcomputing.springeropen.com/articles/10.1186/s13677-021-00244-3

[20] R. Yadav and P. Sharma, "Improving Cloud Storage Efficiency with Adaptive Compression Algorithms," Int. J. Innov. Technol. Explor. Eng., vol. 9, no. 3, 2020. https://www.ijitee.org/download/volume-9-issue-3/

[21] K. Rao, M. Bansal, and J. Patel, "Evaluation of Compression Algorithms on Cloud-Based Big Data Sys-tems," Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 5, 2022. https://doi.org/10.14569/IJACSA.2022.01305102

[22] H. Rathod and S. Patel, "Performance Enhancement in Cloud Databases Using Hybrid Compression Approaches," Int. J. Electr. Comput. Eng., vol. 12, no. 4, 2022. https://www.ijece.org/abstracts/v12i4/Abstract_2022_v12i4.html

[23] Á. Fehér, G. Kotsis, and Z. Szalay, "Adaptive Compression in In-Memory Database Systems," IEEE Access, vol. 8, pp. 120345–120358, 2020. https://ieeexplore.ieee.org/document/9128233

[24] P. Kerschke et al., "A Survey on Adaptive Compression Techniques for Machine Learning Models," J. Mach. Learn. Res., vol. 20, no. 1, pp. 1–45, 2019. https://www.jmlr.org/papers/v20/kerschke19a.html

[25] S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," arXiv preprint arXiv:1705.07874, 2017. https://arxiv.org/abs/1705.07874

Downloads

Published

2025-10-03

How to Cite

Assessing the Impact of Data Compression Algorithms on Database Performance and Scalability in Cloud Computing Platforms. (2025). International Journal of Advanced Research in Computer Science & Technology(IJARCST), 8(5), 12864-12873. https://doi.org/10.15662/IJARCST.2025.0805015