TLD - Operations of TLD-NUCA - TLD-NUCA - Effective Utilisation of LLCs by Managing Associativi

7.3 TLD-NUCA

7.3.2 Operations of TLD-NUCA

7.3.2.5 TLD

The structure of TLD is explained earlier. The correspondingtld-block for a block B is referred to as tld(B). Atld-block can be in any of the four states:

• Not present (NP) - the tld-block is not in TLD.

• Normal (NR)- thetld-block has a valid tag entry and also the bank-id where the corresponding block resides.

• Miss Lock (ML) - the block is currently being fetched (or under fetching) by some bank from the main memory.

• Wait (WT) - the block is either under replacement or movement in some bank and therefore any more requests (for block B) have to wait.

The state diagram of TLD for atld-block is given in Figure7.8. When a block (B) is first requested by a core (C) it will not be in its local-bank and hence sends a request to TLD (as discussed in Section7.3.2.1). If the block is not in cache, initial status of the corresponding tld-block (tld(B)) is NP. TLD sends a response Miss- [T^y] to L^c₂ and changes the state of tld(B) to ML. Here the miss lock is necessary to prevent any other bank from fetching the block simultaneously. TLD will be forced to stall any other requests for the same block untilL^c₂ sends Update#i-[L^c₂]

Figure 7.8: State diagram describing the status of a tld-block in TLD. The letter c, x and y are used to represent the sender banks uniquely. The corre-

sponding actions requires to perform in each transition is not shown.

to T^y. Receiving this message from L^c₂ means L^c₂ fetched the block from main memory (or is in the process of fetching data but the address is already allocated).

In that case the status of tld(B) is changed from MLtoNR. The status of tld(B) is NR means the block B is definitely present in some bank, called target bank (say L^t₂). Since L^c₂ just fetched the block hence currently t = c. Now TLD can forward any further request forB to the target bank (L^t₂).

The operations of a target bank on receiving a block request from TLD is already discussed in Section7.3.2.4. If TLD receivesWait-[L^t₂], as response from the target bank ofB then the status oftld(B) is changed toWT. It means the block is either under replacement or movement; TLD has to wait until the task is completed.

Multiple wait signals from the target bank keep the tld(B) in WT. The status is WT until a further request (Removed-[L^t₂] orUpdate#i-[L^x₂];x6=t) arrives from a bank with the latest status ofB. ReceivingRemoved-[L^t₂] from target bank means the block is removed from the cache and hence has to be removed from TLD. If TLD receives Update#i-[L^x₂] it means the block is moved (cascaded/migrated) to L^x₂ and placed in the i^th way. TLD can now forward future block requests for B toL^x₂. The status of tld(B) is changed to NR and makes L^x₂ as the target bank.

7.3.2.6 Additional conditions to be handled

The major operations of TLD-NUCA are discussed above but there are some other conditions that need to be taken care of by the controllers. The messages

communicated through NoC may reach the destination out of order (depending on the NoC routing logic). The controllers must have mechanism to handle the out of order messages without any loss of consistency. For example, in a bank (say L^c₂) when a block V replaces V⁰ (consider no cascading) then V⁰ is temporarily stored in a buffer for removal and the cache space is allocated to V. L^c₂ sends two messages to TLD separately: (a) after V⁰ gets completely removed from the cache (Removed-[L^c₂]), and (b) after V is placed (Update#i-[L^c₂]). These two messages may reach one after another in TLD. The two scenarios are explained below:

• If Removed-[L^c₂] reaches first then the entry tld(V⁰) gets removed from the TLD. Later when Update#i-[L^C₂] reaches, tld(V) is placed in place of the previous tld(V⁰).

• If Update#i-[L^c₂] for block V reaches first then to make space for tld(V), the existing tld(V⁰) must be stored in some temporary buffer of TLD. Later when Removed-[L^c₂] arrives, the tld(V⁰) from the TLDs temporary buffer is deleted.

Note that an additional buffer is required in the second case. The out of order behavior of all the messages is handled accurately.

It may be possible that TLD forwards a block request to a target bank (L^t₂) which is the same bank (L^c₂) that requested for the block. The situation may arise when L^c₂ does not have the block initially and forwards the request to TLD but before TLD can decide about its actual location in LLC, the block gets moved toL^c₂ due to either migration or cascading replacement. In this case the target bank does not require to inform the requesting core with the message Done-[L^t₂] (discussed in section 7.3.2.4) because both are the same bank.

7.4 Experimental analysis

In order to evaluate TLD-NUCA, the simulation has been performed by using multi-core cycle accurate simulator GEMS [85] which runs on top of SIMICS [82],

Component Parameters No. of tiles, Processor, L1 cache size 16, UltraSPARCIII+, 64KB 4-way

Total LLC (L2) size 2MB and 4MB

bank size 256KB 4 way and 128KB 4-way

Block Size, Memory bank 64KB, 1GB 4KB/page

Router pipeline stage 5-stage

Access latency L1/L2/TLD 2/6/3 cycles

Table 7.2: System Parameters

a full-system functional simulator. The performance of TLD-NUCA is compared with both T-SNUCA and T-DNUCA. All the TCMP architectures (T-SNUCA, T-DNUCA and TLD-NUCA) are implemented on top of the baseline TCMP as shown in Figure 3.1. GEMS is capable of simulating the entire memory system of CMP. The complete protocol as discussed in Section 7.3.2 is implemented in GEMS. The configuration details of the processor, cache memory and main memory used are given in Table 7.2. The Parsec benchmarks used for this simulation are: vips, body, frrt, flud, frq, blck, mx1, mx2 and mx3. The detail description about the benchmarks are given in Table 3.3. The benchmarks are executed on each simulated architecture to evaluate the performance of it.

Dalam dokumen Effective Utilisation of LLCs by Managing Associativity, Placement and Mapping (Halaman 172-175)