Big Data Storage Optimization Techniques in Distributed Environments

Authors

  • Bhavna Lakra Khan Savitribai Phule Pune University, Pune, India Author

DOI:

https://doi.org/10.15662/IJARCST.2024.0705001

Keywords:

Big Data, Storage Optimization, Distributed Environments, HDFS, Multi-Agent Systems, Software-Defined Storage, SSD Caching, Data Movement, JVM Heap, 2023

Abstract

Efficient storage optimization remains crucial in distributed big data systems as data volumes and velocity continue to rise. This 2023 study investigates a spectrum of strategies to enhance storage performance, reduce energy usage, and minimize cost in distributed environments. Drawing on recent research—including multi-agent systems in HDFS, software-defined storage, JVM-SSD hybrid configurations, data movement techniques, and emerging abstractions—we present a comprehensive analysis and evaluation. Our methodology synthesizes developments from diverse papers and experimental insights: a multi-agent Hadoop framework that dynamically classifies hot and cold data for replication and compression; distributed in-memory platforms leveraging SSD-backed caching to reduce shuffle-related performance bottlenecks; software-defined storage (SDS) as an abstraction for dynamic resource reallocation; and data movement optimizations such as data partitioning, compression, and cache-oblivious algorithms to reduce latency and improve access efficiency. Findings highlight that multi-agent HDFS approaches yield notable gains in storage utilization, energy savings, and handling of hot/cold data patterns E3S Conferences. SSD-assisted caching paired with JVM adjustments effectively mitigates shuffle spill and accelerates Spark workloads MDPI. SDS frameworks offer modular scaling and automated tiering for heterogeneous storage landscapes datastoragetech.comWikipedia. Additionally, smart data movement strategies—such as compression, partitioning, and cache-aware placement—substantially reduce data transfer overhead in analytics pipelines Ewa Direct. In conclusion, while individual optimization techniques deliver measurable benefits, an integrated, adaptive framework combining them is essential for optimal results. Future work should explore automated orchestration, energy-aware storage consolidation, and edge/fog-based hierarchies to further enhance scalability and performance.

References

1. Sais, M., Rafalia, N., Mahdaoui, R., & Abouchabaka, J. (2023). Distributed storage optimization using multi-agent systems in Hadoop. E3S Web of Conferences, 412, 01091. E3S Conferences

2. (2023) Optimization Techniques for a Distributed In-Memory Computing Platform by Leveraging SSD. Applied Sciences, MDPI. MDPI

3. (2023) Enterprise data management: storage optimization tips and software-defined storage. Data Storage Tech. datastoragetech.com

4. (2023) Software-defined storage. Wikipedia. Wikipedia

5. (2023) Investigating techniques to optimize data movement and reduce memory-related bottlenecks. Applied and Computational Engineering. Ewa Direct

6. Dehghani, Z. (2022). Data mesh framework and data governance paradigms—applied in 2023. Wikipedia. Wikipedia

7. (2023) Optimizing Data Processing: A Comparative Study of Big Data Platforms in Edge, Fog, and Cloud Layers. Applied Sciences, MDPI. MDPI

8. (2023) Edge computing concept and efficiency advantages. Wikipedia. Wikipedia

Downloads

Published

2024-09-01

How to Cite

Big Data Storage Optimization Techniques in Distributed Environments. (2024). International Journal of Advanced Research in Computer Science & Technology(IJARCST), 7(5), 10907-10910. https://doi.org/10.15662/IJARCST.2024.0705001