What needs to be done to improve this research is summarized below.
• It is necessary to analyze whether the pages occupying each process are located in DRAM or PM.
• Experiments should be carried out in an environment where several processes are operating at the same time.
• Based on the analysis, a more sophisticated MCAM policy should be implemented.
22
IX Conclusion
The emergence of persistent memory (PM), which has characteristics of similar to DRAM at much lower cost, is now being deployed in real systems. By configuring systems in hybrid form, with both PM and DRAM, demands for high memory capacity can be cost-effectively satisfied. However, if memory management is not done judiciously, it can have detrimental effects on performance. Hence, the negative characteristics of PM such as latency and bandwidth limitations that differ from DRAM have required researchers to study judicious use policies to improve system-wide efficiency. Such policies become es- pecially important when PM is applied to NUMA (Non-Uniform Memory Access) systems where PM performance becomes even more sensitive and different from DRAM according to local and remote ac- cesses. Previous studies have referred to this memory environment as a tiered memory environment, and recent studies have been conducted to optimize the performance in this environment. While early results have proposed ways to optimize performance in hybrid tiered memory environments, in this thesis, we consider the effect of multithreading and concurrency level, which to the best of our knowledge, have not been considered elsewhere. We implement the MCAM policy, which is a multithreading and con- currency aware policy. As a result of testing the performance of the PM distribution system by applying the newly implemented MCAM policy, 38% improvement was obtained compared to the policy of the existing system.
23
References
[1] N. Boden. Available first on google cloud: Intel optane dc per- sistent memory. [Online]. Available: https://cloud.google.com/blog/topics/partners/
available-first-on-google-cloud-intel-optane-dc-persistent-memory
[2] Intel optane pmem. [Online]. Available: https://www.intel.co.kr/content/www/kr/ko/
architecture-and-technology/optane-dc-persistent-memory.html [3] The graph500 benchmark. [Online]. Available: https://graph500.org
[4] N. Sundaram, N. R. Satish, M. M. A. Patwary, S. R. Dulloor, S. G. Vadlamudi, D. Das, and P. Dubey, “Graphmat: High performance graph analytics made productive,” arXiv preprint arXiv:1503.07241, 2015.
[5] Redis. [Online]. Available: https://redis.io
[6] J. Malicevic, S. Dulloor, N. Sundaram, N. Satish, J. Jackson, and W. Zwaenepoel, “Exploiting nvm in large-scale graph analytics,” inProceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads, 2015, pp. 1–9.
[7] C. Lameter, “Local and remote memory: Memory in a linux/numa system,” inLinux Symposium.
Citeseer, 2006.
[8] A. Raybuck, T. Stamler, W. Zhang, M. Erez, and S. Peter, “Hemem: Scalable tiered memory man- agement for big data applications and real nvm,” inProceedings of the ACM SIGOPS 28th Sympo- sium on Operating Systems Principles, 2021, pp. 392–407.
[9] J. Kim, W. Choe, and J. Ahn, “Exploring the design space of page management for{Multi-Tiered}
memory systems,” in 2021 USENIX Annual Technical Conference (USENIX ATC 21), 2021, pp.
715–728.
[10] Z. Yan, D. Lustig, D. Nellans, and A. Bhattacharjee, “Nimble page management for tiered memory systems,” inProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019, pp. 331–345.
[11] N. Agarwal and T. F. Wenisch, “Thermostat: Application-transparent page management for two- tiered main memory,” inProceedings of the Twenty-Second International Conference on Architec- tural Support for Programming Languages and Operating Systems, 2017, pp. 631–644.
24
[12] Z. Duan, H. Liu, X. Liao, H. Jin, W. Jiang, and Y. Zhang, “Hinuma: Numa-aware data placement and migration in hybrid memory systems,” in2019 IEEE 37th International Conference on Com- puter Design (ICCD). IEEE, 2019, pp. 367–375.
[13] T. Heo, Y. Wang, W. Cui, J. Huh, and L. Zhang, “Adaptive page migration policy with huge pages in tiered memory systems,”IEEE Transactions on Computers, no. 01, pp. 1–1, 2020.
[14] T. D. Doudali, D. Zahka, and A. Gavrilovska, “Cori: Dancing to the right beat of periodic data movements over hybrid memory systems,” in 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2021, pp. 350–359.
[15] E. Yildirim, E. Arslan, J. Kim, and T. Kosar, “Application-level optimization of big data transfers through pipelining, parallelism and concurrency,”IEEE Transactions on Cloud Computing, vol. 4, no. 1, pp. 63–75, 2015.
[16] V. Viswanathan. Intel® memory latency checker v3.9a. [Online]. Available: https://www.intel.
com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html
[17] D. Ojika, B. Patel, G. A. Reina, T. Boyer, C. Martin, and P. Shah, “Addressing the memory bottle- neck in ai model training,”arXiv preprint arXiv:2003.08732, 2020.
[18] B. Nicolae, D. Moise, G. Antoniu, L. Bougé, and M. Dorier, “Blobseer: Bringing high throughput under heavy concurrency to hadoop map-reduce applications,” in2010 IEEE International Sympo- sium on Parallel & Distributed Processing (IPDPS). IEEE, 2010, pp. 1–11.
[19] M. H. Mofrad, R. Melhem, Y. Ahmad, and M. Hammoud, “Multithreaded layer-wise training of sparse deep neural networks using compressed sparse column,” in2019 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2019, pp. 1–6.
[20] A. Denis, “pioman: a pthread-based multithreaded communication engine,” in2015 23rd Euromi- cro International Conference on Parallel, Distributed, and Network-Based Processing. IEEE, 2015, pp. 155–162.
[21] J. Yang, J. Kim, M. Hoseinzadeh, J. Izraelevitz, and S. Swanson, “An empirical guide to the be- havior and use of scalable persistent memory,” in18th{USENIX}Conference on File and Storage Technologies ({FAST}20), 2020, pp. 169–182.
25
Acknowledgements
We would like to thank Prof. Sam H. Noh, Dr. Hyeonho Song and Dr. Hyunsub Song, who greatly helped to complete the thesis.
26