A Page Scheduler using Machine Learning for Hybrid Memory Systems

We propose SMA, an RNN-based page scheduler to learn page access patterns and ensure that pages to be accessed by applications in the future are prepared in advance in a fast cache. This work reduces training time and memory usage compared to existing state-of-the-art machine learning-based page schedulers while providing higher accuracy. It also shows that a single RNN model can learn general page access patterns achieving similar accuracy to the existing page scheduler for applications not included in the model's training dataset.

1 Overview of our hybrid page scheduler SMA, which combines the intelligent page management and existing history page scheduler. The page scheduler migrates hot pages that applications access frequently to fast memory so that hot pages to be accessed are prepared in fast memory. A perfect page scheduler behaves as if the memory system had the speed of the fast and small memory and the capacity of the large and slow memory.

The history page scheduler assumes that past access patterns will remain the same in the future. The Oracle page scheduler knows future access patterns and migrates pre-accessed pages to the fast memory. Kleio [3], an existing state-of-the-art machine learning-based memory page scheduler, observed that there is a set of critical pages among hundreds of thousands of pages.

This work proposes SMA, a page scheduler that minimizes overhead with a single RNN model managing all important pages.

Hybrid Memory System

Recurrent Neural Networks

For each page prediction, Kleio trains one LSTM model per page to learn the access pattern of the page. Kleio's LSTM model receives the order of the number of pages visited in previous scheduling periods and predicts the next number of accesses to the corresponding page. Benefit factor=the number of accesses×the number of misplacements (2) This means that a page becomes more important the more times it is visited, and more misplaced by the history-based page scheduler, because the pattern of page accesses is difficult to predict.

Since the percentage of pages managed by the RNN model and the performance improvement are nonlinear, intelligent management of a few pages can achieve performance improvements similar to intelligent management of all pages. It is difficult to implement in practice because it takes quite a long time compared to the period of the scheduling era. Therefore, Coeus groups sites with the same access pattern and reduces overhead by using the same RNN model.

Can we use a single model to predict future memory accesses for all pages with high accuracy? If the learning power of one RNN model is sufficient, it would not be possible to learn multiple access patterns. Or accuracy can be improved by increasing the size of the dataset to train the model.

If multiple page access patterns can be learned, the trained model can be applied to unseen applications not included in the training dataset. Then, it is really possible for a model to learn multiple access patterns from all pages. Since the number of pages accessed at runtime per application is in the hundreds of thousands, it is difficult to learn access patterns for all pages.

This is because memory usage and inference latency increase with sufficient learning capacity of the RNN model. The RNN model must be shared across multiple application pages, but it is not necessary to learn access patterns for all pages. The inference plane uses a trained RNN model and predicts the next number of accesses for important pages.

Inference Plane

The training plane learns additional memory access traces collected during application execution so that the model is more accurate for that application. Page Selector Figure 2 shows the DRAM hit rate by the number of misplaced pages managed by the Oracle page scheduler. Misplaced pages are pages that should have been placed into DRAM by the history page scheduler and have been placed into NVM at least once.

The advantage factor of pages used for priority sorting of pages is formulated in Equation 2. As the number of pages managed by the Oracle page scheduler increases, the DRAM hit rate increases non-linearly. Thus, by managing only a few important pages, the application can achieve performance improvement similar to managing all pages with the Oracle page scheduler.

The total number of pages accessed is different for each application, and the ratio of pages misplaced by the history-based method is also different. The performance improvement depends on the page fault ratio managed by the oracle page scheduler. Therefore, the page selector can observe the DRAM hit rate at runtime and increase the number of pages if it is below the target level.

If the number of managed pages exceeds the total number of misplaced pages, no higher performance improvement can be obtained, so this does not increase the number of pages anymore. LSTM Model The LSTM model for managing important pages is pre-trained using collected memory access traces. Since all page access patterns can be learned with a single model, the trained SMA model can be applied to invisible applications without any offline tasks.

Therefore, the model is periodically updated with additional trained weights to improve accuracy by learning memory traces collected at runtime. Using the same architecture as Coeus means that the model size and inference time are the same as Coeus. The SMA model takes series of access counts of important pages as input and calculates the output for multiple series simultaneously.

Figure 2: DRAM hit rate according to the number of misplaced pages managed by the Oracle page scheduler.

Training Plane

Applications

Simulated Hybrid Memory System

LSTM Model Implementation

Dateset for LSTM Model

The total memory usage for each application is equal to 4KB times the number of pages. The number of scheduling epochs is equal to the total number of memory requests divided by the number of memory requests per epoch rounded up. First, we compare the accuracy of SMA with Kleio in three different combinations of unseen application selection.

Model Accuracy

SMA predicted each of the future page access counts using a single RNN model that learned the memory footprint of each of the 100 relevant pages from 6 applications. The scheduling epoch was set to the dominant reuse length for each application using Cori [10]. For the first combination of choosing three unseen applications, SMA achieves higher accuracy for 5 out of 6 seen applications than Kleio, and 2 out of 3 unseen applications.

In the second combination, SMA achieves higher accuracy in 5 out of 6 visible applications than Kleio and 1 out of 3 invisible applications. In the third combination, SMA achieves higher accuracy in 6 out of 6 visible applications than Kleio and 2 out of. The average value (average MAE of SMA for each application / average MAE of Kleio for each application) × 100 (%) is 70.7%, so SMA provides higher accuracy than Kleio.

The average value of the difference between SMA and Kleio's MAE for each application in percentage is 91.7%. This means it can provide as high accuracy as Kleio, and it can learn the common page access pattern so that it can also be applied to unseen applications.

Figure 4: Mean absolute error (MAE) for 100 important pages of SMA and Kleio for the second com- com-bination

Training Time

Memory Usage

Discussion

This work uses a single RNN model to predict future page accesses and uses this information to place future page accesses in the fast memory of a hybrid memory system. Meswani [1] proposed a page scheduler that does not use machine learning technology in a hybrid memory system. Meswani introduces a hotness threshold to avoid the overhead of sorting pages by the number of accesses.

Like this work, Kleio [3] and Coeus [4] proposed a page scheduler using machine learning technology in a hybrid memory system. Kleio predicts the number of times to access the next epoch by learning previous access patterns using a model per page. Kleio noted that there are a set of pages that are critical to application performance and reduce the overhead of training and inferring RNN models.

Embedding of LSTM receives the performance counter and delta of the current time step as input performs classification to the most common 50,000 unique delta values, and prefetches K high probability deltas. SMA does not consider the effect of the number of pages managed by the RNN model on the actual application execution time. Therefore, it is necessary to analyze the change in the actual time to execute the application as the number of pages managed by the RNN model and determine the number of important pages that affect the actual time to execute the application at runtime , minimize.

Frequent updating of a newly trained model can increase accuracy by responsively adapting to the application, thus reducing the overall time to run the application. Therefore, it is important to observe the effect of the model update cycle on the total execution time and choose a sufficiently responsive cycle without significant overhead. We demonstrate that SMA learns the memory access patterns of multiple pages achieving higher accuracy for visible applications than the state-of-the-art machine learning-based Kleio page scheduler.

SMA reduces training time and memory consumption compared to Coeus, a previous study that reduced the number of models required by grouping pages with the same access pattern. Ranganathan, “Learning memory access patterns,” in Proceedings of the 35th International Conference on Machine Learning, ser. Gavrilovska, “Kleio: A hybrid memory page scheduler with machine intelligence,” in Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, ser.

Gavrilovska, “Cori: Dancing to the beat of periodic data movements over hybrid memory systems,” in 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021, pp. Mutlu, “Sibyl: Adaptive and extensible data placement in hybrid storage systems using online reinforcement learning,” in Proceedings of the 49th Annual International Symposium on Computer Architecture, ser.