Performance Monitoring
3.2 Instruction Cache
3.2.1 Operation When Instruction Cache is Enabled
3.2.1.3 Instruction-Cache Coherence
To achieve cache coherence, instruction cache contents can be invalidated after code modification in external memory is complete.
If the instruction cache is not enabled, or code is being written to a non-cacheable region, software must still invalidate the instruction cache before using the newly- written code. This precaution ensures that state associated with the new code is not buffered elsewhere in the processor, such as the fetch buffers or the BTB.
Naturally, when writing code as data, care must be taken to force it completely out of the processor into external memory before attempting to execute it. If writing into a non-cacheable region, flushing the write buffers is sufficient precaution (see “Register 7: Cache Functions” on page 81 for a description of this operation). If writing to a cacheable region, then the data cache should be submitted to a Clean/Invalidate operation (see “Cacheability” on page 63) to ensure coherency.
After reset, the instruction cache is always disabled, unlocked, and invalidated (flushed).
The instruction cache is enabled by setting bit 12 in coprocessor 15, register 1 (Control Register). This process is illustrated in Example 5, Enabling the Instruction Cache.
Example 4. Recovering from an Instruction Cache Parity Error
; Prefetch abort handler
MCR P15,0,R0,C7,C5,0 ; Invalidate the instruction cache and branch target
; buffer
CPWAIT ; wait for effect (see “Additions to CP15 Functionality”
on page 153 for a
; description of CPWAIT)
SUBS PC,R14,#4 ; Returns to the instruction that generated the
; parity error
; The Instruction Cache is guaranteed to be invalidated at this point
The entire instruction cache along with the fetch buffers are invalidated by writing to coprocessor 15, register 7. (See Table 18, “Cache Functions” on page 81 for the exact command.) The invalidate command does not unlock any lines that were locked in the instruction cache nor does it invalidate those locked lines. To invalidate the entire cache including locked lines, the unlock instruction cache command needs to be executed before the invalidate command. The unlock command can also be found in Table 20,
“Cache Lock-Down Functions” on page 83.
There is an inherent delay from the execution of the instruction cache invalidate command to where the next instruction will see the result of the invalidate. The following routine can be used to guarantee proper synchronization.
The Intel XScale processor also supports invalidating an individual line from the instruction cache. See Table 18, “Cache Functions” on page 81 for the exact command.
Software has the ability to lock performance critical routines into the instruction cache.
Up to 28 lines in each set can be locked; hardware will ignore the lock command if software is trying to lock all the lines in a particular set (i.e., ways 28-31can never be locked). When all ways in a particular set are requested to be locked, the instruction cache line will still be allocated into the cache but the lock will be ignored. The round- robin pointer will stay at way 31 for that set.
Cache lines can be locked into the instruction cache by initiating a write to
coprocessor 15. (See Table 20, “Cache Lock-Down Functions” on page 83 for the exact command.) Register Rd contains the virtual address of the line to be locked into the cache.
There are several requirements for locking down code:
• The routine used to lock lines down in the cache must be placed in non-cacheable memory, which means the MMU is enabled.
Example 5. Enabling the Instruction Cache
; Enable the ICache
MRC P15, 0, R0, C1, C0, 0 ; Get the control register ORR R0, R0, #0x1000 ; set bit 12 -- the I bit MCR P15, 0, R0, C1, C0, 0 ; Set the control register
CPWAIT
Example 6. Invalidating the Instruction Cache
MCR P15,0,R1,C7,C5,0 ; Invalidate the instruction cache and branch
; target buffer
CPWAIT
; The instruction cache is guaranteed to be invalidated at this point; the next
; instruction sees the result of the invalidate command.
As a result: no fetches of cacheable code should occur while locking instructions into the cache.
• The code being locked into the cache must be cacheable
• The instruction cache must be enabled and invalidated prior to locking down lines.
Failure to follow these requirements will produce unpredictable results when accessing the instruction cache.
System programmers should ensure that the code to lock instructions into the cache does not reside closer than 128 bytes to a non-cacheable/cacheable page boundary. If the processor fetches ahead into a cacheable page, then the first requirement noted above could be violated.
Lines are locked into a set starting at way 0 and may progress up to way 27; which set a line gets locked into depends on the set index of the virtual address. Figure 9 is an example (32-Kbyte cache) of where lines of code may be locked into the cache along with how the round-robin pointer is affected.
Software can lock down several different routines located at different memory
locations. This may cause some sets to have more locked lines than others as shown in Figure 9.
Example 7 on page 58 shows how a routine, called “lockMe” in this example, might be locked into the instruction cache. Note that it is possible to receive an exception while locking code (see “Event Architecture” on page 154).
Figure 9. Locked Line Effect on Round-Robin Replacement
way 0 way 1 way 7 way 8 way 22 way 23 way 30 way 31
set 1 set 31
Locked
set 0
Locked
set 2
Locked
...
set 0: 8 ways locked, 24 ways available for round robin replacement set 1: 23 ways locked, 9 ways available for round robin replacement set 2: 28 ways locked, only way 28-31 available for replacement set 31: all 32 ways available for round robin replacement
.........
32-Kbyte Cache Example
The Intel XScale processor provides a global unlock command for the instruction cache.
Writing to coprocessor 15, register 9 unlocks all the locked lines in the instruction cache and leaves them valid. These lines then become available for the round-robin replacement algorithm. (See Table 20, “Cache Lock-Down Functions” on page 83 for the exact command.)