3.3 Proposed Hybrid Cache Architecture
3.3.1 Basic Organization
3.3.1.1 States added in the MESI Protocol
The main idea of the proposal is to identify the private blocks and store only the tag part when they are loaded from the main memory on an L2 cache miss. The actions needed to be taken for the loaded block according to the requests/response generated by different cores are explained in this subsection. Table 3.1 lists the request or response generated from the L1 cache along with its brief description.
In this subsection, we discuss (1) The actions and states of the blocks when they are loaded from the main memory on an L2 cache miss. (2) The transitions of the blocks to different states, when they are replaced by the owner L1 cache or when there is a new request or response from other L1 caches. (3) Migration of blocks to SRAM, in case they become write-intensive. Note that here, the data is migrated to the SRAM on the second write-back operation to STT region; as we consider such blocks as write-intensive. All the cases are shown with the help of the MESI protocol state diagram. As we maintain the entries without data in the STT region of hybrid cache, we make some changes in the MESI protocol. The changes include the addition of some new states and their associated transitions.
The details of all the cases mentioned above are described below:
Fig. 3.7(a) shows the actions and states of the block when they are loaded from the main memory. The given state diagram has two types of states: (1) Tag with Data state, and (2) Tag only state. A brief description of the states is given below:
• I: Invalid entry of L2 cache.
Chapter 3. Reducing Write Cost by Dataless Entries and Prediction 63
ST-S P
P' ST-C
GETX GETS GET_IN
STR
GETX/
Stall &
Wait
Last PUTS, PUTX GET_INSTR PUTX
GETS, GETX, PUTS, UPGRADE GET_INSTR, WB_CLEAN/Pending GETX
GE TS GETX / Pending GETX
DATAGETS I
ST-S
ST-C ST-D
SR-C
GETS,GETX
GETS, GETX, PUTS, UPGARDE, WB_CLEAN Last PUTS,
PUTX
GETS, GETX,PUTS,PUTX, UPG -RADE, WB_DATA, WB_CLEAN
GET_INSTR
PUTX, WB_DATA / Migrate to SRAM WB_DATA
(a)
(b)
S GET_INSTR
GET_INSTR GETS, PUTS
WB_CLEAN
GETS GETX, UPGRADE
M GETX
GETS WB_DATA
GETX For ST-S
For ST-D
Figure 3.7: (a). State Diagram of STT region of the cache showing new states along with the associated transitions (b). State Diagram showing the migration
process from STT-RAM to SRAM region.
• ST-S: An L2 cache entry that is shared between two or more cores or having read-only permission in the STT region of the cache. Note that it is a tag with data state.
• P:An L2 cache entry in the STT region that is held exclusively (both read/write permission) by L1 cache. Note that it is the tag only state.
• ST-C: An L2 cache entry in the STT region of the cache (without any owner or sharer(s)). Note that it is a tag with data state.
• P’: The state acts as an intermediate state. The transition to this state hap- pens when some core requests the block with read-only permission (GETS). If the current request is served without any intervention by another core or other L1 cache (Another read-only request from the other core), the L2 cache block is served to the requestor L1 cache along with the exclusive permission and the state of the block changed to P. Otherwise, if there is an intervening request (GETS) from another core while the block is being loaded from the main memory, the in- coming block is served with the read-only permission to the requesting L1 caches.
In this case, the state of the block is changed to state ST-S.
Chapter 3. Reducing Write Cost by Dataless Entries and Prediction 64 (a) Actions to be taken on L2 cache miss
The description of transitions between the states according to the requests/re- sponses generated by the L1 cache for the block that is not present in the L2 cache is described below:
• When the core or L1 request an instruction block (GET INSTR):
The request is sent to the next level of memory, and the incoming block is loaded in the STT region of the cache. In the directory entry, the state field is changed from I to ST-S. The requested block is sent to L1.
• When the block is requested exclusively (GETX) by L1 cache: The incoming block from the main memory will be allocated in the STT region of the cache. But in this case, the loaded block will not update the data array, and the array remains empty. The state of the block is changed from the state I to P. The requested block is sent to L1.
• When the block is requested with read-only permission (GETS) by L1 cache: The state of the block is first changed to intermediate P’ state.
Now, in this case, the final state of the block depends on whether there is an intervention from another L1 cache (Read Request from another L1 cache). If there is no intervention, the block is served exclusively to the requestor L1 by changing the state from state P’ to P. Otherwise, a block is provided with the read-only permission to the requestor L1 caches. In this case, the state of the block is changed to state ST-S.
(b) Actions to be taken on a block replacement from L1 cache or on a request or response from the L1 cache to the block in the L2 cache When the block is placed into the L2 cache, there can be many changes in the state of the block. The changes are due to request or response generated from the L1 cache or when the block is replaced from the L1 cache. All these changes are described below:
• When the block is replaced from the L1 cache (Last PUTS or PUTX) and the state of the block in the L2 cache is ST-S: The write-back
Chapter 3. Reducing Write Cost by Dataless Entries and Prediction 65 operation is scheduled according to the dirty bit of the block. Once the write- back operation is performed, the state of the block is changed to state ST-C.
• When the block is evicted from the L1 cache (PUTX) and the state of the block in the L2 cache is P: The write-back operation is scheduled irrespective of the dirty bit of the block. In this case, the L2 cache entry is dataless, and so the data has to be written regardless of its dirty status. The state of the block is changed to state ST-C.
• When the L2 cache block is in state ST-S, and the sharers of the block are reduced to one: The state of the block remains ST-S. The block can either contain fresh or stale data, and it can handle both M and S state of MESI protocol. The way ST-S handles both M and S states is shown in the dotted ellipse in fig. 3.7.
• When the block is requested exclusively (GETX) by some other core and the state of the block in the L2 cache is P: In this case, the request from the other core will be forwarded to the owner of the block. Then, the owner L1 will send the data to the requestor L1 and invalidate its own copy of data. In this situation, the requestor will become the new owner of the block, and the owner field in the directory is updated accordingly. The state of the block remains P.
• When the block is requested with the read-only permission (GETS) by some other core and the state of the block in the L2 cache is P:
The request from the other core will be forwarded to the owner of the block which in turn sends the data to the requestor. But in this case, instead of invalidating its own data, the owner L1 will keep the copy of data with shared permission. As the block is shared by more than one core, the write operation is performed on L2 cache as the cache entry did not contain any data previously.
The state of the block is changed to state ST-S.
• When the block is requested exclusively (GETX) by some core and the state of the block is P’ : The write request will be stalled and will be served later. The stalled request will continue to be stalled until the state of the block is changed to either ST-S or P.
Chapter 3. Reducing Write Cost by Dataless Entries and Prediction 66
• When the core or L1 request the instruction block which is residing in L2 cache with state ST-C: The instruction request will be served by providing the block to the requestor L1 along with the shared permission. The state of the block is changed to state ST-S.
(c) Migration of Block to SRAM
Our policy migrates the block to SRAM region when the second write-back opera- tion is performed on that block in the L2 cache. Fig. 3.7(b) presents the migration process from STT to SRAM. The description of the states used in the state dia- gram are presented below:
• ST-D: An L2 cache entry that resided in the STT region and is the potential candidate for migration. Note that it is the tag with data state.
• SR-C: An L2 cache entry residing in SRAM region with or without own- er/sharer(s). The state SR-C is used here as an abstraction as it follows the normal MESI protocol for SRAM region. Note that the state is a tag with data state.
The actions to be taken while migrating the block are described below:
• When an L2 cache block having no owner/sharer(s) (state ST-C) gets a read or write request (GETS or GETX): In this case, the block becomes a potential candidate for migration due to prospective multiple write requests. The state of the block is changed to ST-D, and the data is sent to the requestor L1.
• When an L2 cache block receives the write-back response and the state of the block is ST-S: In this case, the write-back block becomes a potential candidate for the migration, the state of the block is changed to ST- D.
• When a block in the L2 cache gets a read or write request and the state of the block in the L2 cache is ST-D: The request will be treated similarly to state ST-S. In this case, the state of the block remains in ST-D, and it can handle both M and S state of MESI protocol, as shown in the dotted ellipse in fig. 3.7.
Chapter 3. Reducing Write Cost by Dataless Entries and Prediction 67
• When a block receives the write-back response (PUTX, WB DATA) and the state of the block in the L2 cache is ST-D: In this case, the migration process of the block from STT to SRAM is executed in two steps.
First, the lookup operation for the invalid entry is performed in the SRAM region. If there is no invalid entry, the LRU victim is selected from the same cache set in the SRAM region and the write-back operation is scheduled to the next level of memory for the victim. Second, the write-back response will be redirected to the available entry in the SRAM region by transferring the tag from STT-RAM to SRAM region. Afterward, the block will be invalidated from STT-RAM. Once the write-back operation is performed, the state of the block in the SRAM region is changed to state SR-C. The state SR-C can either act as an S or I (in case of WB DATA) or M (in case of PUTX) state of MESI protocol. For simplicity purpose, we denoted this by state SR-C in the state diagram.
The benefits of these additional states (shown in fig. 3.7) is to represent the dataless entries (P) in the STT region and to show the states of MESI protocol in an abstract manner (I, ST-S, ST-C, ST-D, and SR-C) according to different regions of HCA. The overheads with the state diagram is to incorporate the dataless and intermediate state in the directory structure (that will add few bits and 5 cycles to search in the directory and the extra changes required to maintain the dataless entries) of the protocol. We have considered all these overheads in the simulations.