7.3 TLD-NUCA
7.3.2 Operations of TLD-NUCA
7.3.2.5 TLD
The structure of TLD is explained earlier. The correspondingtld-block for a block B is referred to as tld(B). Atld-block can be in any of the four states:
• Not present (NP) - the tld-block is not in TLD.
• Normal (NR)- thetld-block has a valid tag entry and also the bank-id where the corresponding block resides.
• Miss Lock (ML) - the block is currently being fetched (or under fetching) by some bank from the main memory.
• Wait (WT) - the block is either under replacement or movement in some bank and therefore any more requests (for block B) have to wait.
The state diagram of TLD for atld-block is given in Figure7.8. When a block (B) is first requested by a core (C) it will not be in its local-bank and hence sends a request to TLD (as discussed in Section7.3.2.1). If the block is not in cache, initial status of the corresponding tld-block (tld(B)) is NP. TLD sends a response Miss- [Ty] to Lc2 and changes the state of tld(B) to ML. Here the miss lock is necessary to prevent any other bank from fetching the block simultaneously. TLD will be forced to stall any other requests for the same block untilLc2 sends Update#i-[Lc2]
Figure 7.8: State diagram describing the status of a tld-block in TLD. The letter c, x and y are used to represent the sender banks uniquely. The corre-
sponding actions requires to perform in each transition is not shown.
to Ty. Receiving this message from Lc2 means Lc2 fetched the block from main memory (or is in the process of fetching data but the address is already allocated).
In that case the status of tld(B) is changed from MLtoNR. The status of tld(B) is NR means the block B is definitely present in some bank, called target bank (say Lt2). Since Lc2 just fetched the block hence currently t = c. Now TLD can forward any further request forB to the target bank (Lt2).
The operations of a target bank on receiving a block request from TLD is already discussed in Section7.3.2.4. If TLD receivesWait-[Lt2], as response from the target bank ofB then the status oftld(B) is changed toWT. It means the block is either under replacement or movement; TLD has to wait until the task is completed.
Multiple wait signals from the target bank keep the tld(B) in WT. The status is WT until a further request (Removed-[Lt2] orUpdate#i-[Lx2];x6=t) arrives from a bank with the latest status ofB. ReceivingRemoved-[Lt2] from target bank means the block is removed from the cache and hence has to be removed from TLD. If TLD receives Update#i-[Lx2] it means the block is moved (cascaded/migrated) to Lx2 and placed in the ith way. TLD can now forward future block requests for B toLx2. The status of tld(B) is changed to NR and makes Lx2 as the target bank.
7.3.2.6 Additional conditions to be handled
The major operations of TLD-NUCA are discussed above but there are some other conditions that need to be taken care of by the controllers. The messages
communicated through NoC may reach the destination out of order (depending on the NoC routing logic). The controllers must have mechanism to handle the out of order messages without any loss of consistency. For example, in a bank (say Lc2) when a block V replaces V0 (consider no cascading) then V0 is temporarily stored in a buffer for removal and the cache space is allocated to V. Lc2 sends two messages to TLD separately: (a) after V0 gets completely removed from the cache (Removed-[Lc2]), and (b) after V is placed (Update#i-[Lc2]). These two messages may reach one after another in TLD. The two scenarios are explained below:
• If Removed-[Lc2] reaches first then the entry tld(V0) gets removed from the TLD. Later when Update#i-[LC2] reaches, tld(V) is placed in place of the previous tld(V0).
• If Update#i-[Lc2] for block V reaches first then to make space for tld(V), the existing tld(V0) must be stored in some temporary buffer of TLD. Later when Removed-[Lc2] arrives, the tld(V0) from the TLDs temporary buffer is deleted.
Note that an additional buffer is required in the second case. The out of order behavior of all the messages is handled accurately.
It may be possible that TLD forwards a block request to a target bank (Lt2) which is the same bank (Lc2) that requested for the block. The situation may arise when Lc2 does not have the block initially and forwards the request to TLD but before TLD can decide about its actual location in LLC, the block gets moved toLc2 due to either migration or cascading replacement. In this case the target bank does not require to inform the requesting core with the message Done-[Lt2] (discussed in section 7.3.2.4) because both are the same bank.
7.4 Experimental analysis
In order to evaluate TLD-NUCA, the simulation has been performed by using multi-core cycle accurate simulator GEMS [85] which runs on top of SIMICS [82],
Component Parameters No. of tiles, Processor, L1 cache size 16, UltraSPARCIII+, 64KB 4-way
Total LLC (L2) size 2MB and 4MB
bank size 256KB 4 way and 128KB 4-way
Block Size, Memory bank 64KB, 1GB 4KB/page
Router pipeline stage 5-stage
Access latency L1/L2/TLD 2/6/3 cycles
Table 7.2: System Parameters
a full-system functional simulator. The performance of TLD-NUCA is compared with both T-SNUCA and T-DNUCA. All the TCMP architectures (T-SNUCA, T-DNUCA and TLD-NUCA) are implemented on top of the baseline TCMP as shown in Figure 3.1. GEMS is capable of simulating the entire memory system of CMP. The complete protocol as discussed in Section 7.3.2 is implemented in GEMS. The configuration details of the processor, cache memory and main mem- ory used are given in Table 7.2. The Parsec benchmarks used for this simulation are: vips, body, frrt, flud, frq, blck, mx1, mx2 and mx3. The detail description about the benchmarks are given in Table 3.3. The benchmarks are executed on each simulated architecture to evaluate the performance of it.