switch()/case/default
13.2 A lot of cases
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT or $at, $zero ; load delay slot, NOP
jr $t9
la $a0, ($LC3 & 0xFFFF) # "something unknown" ; branch delay slot
#
---loc_4C: # CODE XREF: f+14
lui $a0, ($LC2 >> 16) # "two"
lw $t9, (puts & 0xFFFF)($gp)
or $at, $zero ; load delay slot, NOP
jr $t9
la $a0, ($LC2 & 0xFFFF) # "two" ; branch delay slot
#
---loc_60: # CODE XREF: f+8
lui $a0, ($LC1 >> 16) # "one"
lw $t9, (puts & 0xFFFF)($gp)
or $at, $zero ; load delay slot, NOP
jr $t9
la $a0, ($LC1 & 0xFFFF) # "one" ; branch delay slot
The function always ends with calling puts(), so here we see a jump to puts() (JR: “Jump Register”) instead of “jump and link”. We talked about this earlier:13.1.1 on page 143.
We also often see NOP instructions after LW ones. This is “load delay slot”: another delay slot in MIPS. An instruction next to LW may execute at the moment while LW loads value from memory. However, the next instruction must not use the result of LW. Modern MIPS CPUs have a feature to wait if the next instruction uses result of LW, so this is somewhat outdated, but GCC still adds NOPs for older MIPS CPUs. In general, it can be ignored.
13.1.7 Conclusion
A switch() with few cases is indistinguishable from an if/else construction, for example: listing.13.1.1.
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
Listing 13.4: MSVC 2010 tv64 = -4 ; size = 4
_a$ = 8 ; size = 4 _f PROC
push ebp mov ebp, esp push ecx
mov eax, DWORD PTR _a$[ebp]
mov DWORD PTR tv64[ebp], eax cmp DWORD PTR tv64[ebp], 4 ja SHORT $LN1@f
mov ecx, DWORD PTR tv64[ebp]
jmp DWORD PTR $LN11@f[ecx*4]
$LN6@f:
push OFFSET $SG739 ; 'zero', 0aH, 00H call _printf
add esp, 4 jmp SHORT $LN9@f
$LN5@f:
push OFFSET $SG741 ; 'one', 0aH, 00H call _printf
add esp, 4 jmp SHORT $LN9@f
$LN4@f:
push OFFSET $SG743 ; 'two', 0aH, 00H call _printf
add esp, 4 jmp SHORT $LN9@f
$LN3@f:
push OFFSET $SG745 ; 'three', 0aH, 00H call _printf
add esp, 4 jmp SHORT $LN9@f
$LN2@f:
push OFFSET $SG747 ; 'four', 0aH, 00H call _printf
add esp, 4 jmp SHORT $LN9@f
$LN1@f:
push OFFSET $SG749 ; 'something unknown', 0aH, 00H call _printf
add esp, 4
$LN9@f:
mov esp, ebp pop ebp
ret 0
npad 2 ; align next label
$LN11@f:
DD $LN6@f ; 0 DD $LN5@f ; 1 DD $LN4@f ; 2 DD $LN3@f ; 3 DD $LN2@f ; 4 _f ENDP
What we see here is a set of printf() calls with various arguments. All they have not only addresses in the memory of the process, but also internal symbolic labels assigned by the compiler. All these labels are also mentioned in the $LN11@f internal table.
At the function start, if a is greater than 4, control flow is passed to label $LN1@f, where printf() with argument 'some-thing unknown' is called.
But if the value of a is less or equals to 4, then it gets multiplied by 4 and added with the $LN11@f table address. That is how an address inside the table is constructed, pointing exactly to the element we need. For example, let’s say a is equal to 2. 2∗ 4 = 8 (all table elements are addresses in a 32-bit process and that is why all elements are 4 bytes wide). The address of the $LN11@f table + 8 is the table element where the $LN4@f label is stored. JMP fetches the $LN4@f address from the table and jumps to it.
This table is sometimes called jumptable or branch table3.
3The whole method was once called computed GOTO in early versions of FORTRAN:wikipedia. Not quite relevant these days, but what a term!
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
Then the corresponding printf() is called with argument 'two'. Literally, the jmp DWORD PTR $LN11@f[ecx*4]
instruction implies jump to the DWORD that is stored at address $LN11@f + ecx * 4.
npad (88 on page 854) is assembly language macro that aligning the next label so that it is to be stored at an address aligned on a 4 byte (or 16 byte) boundary. This is very suitable for the processor since it is able to fetch 32-bit values from memory through the memory bus, cache memory, etc, in a more effective way if it is aligned.
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
OllyDbg
Let’s try this example in OllyDbg. The input value of the function (2) is loaded into EAX:
Figure 13.9: OllyDbg: function’s input value is loaded in EAX
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
The input value is checked, is it bigger than 4? If not, the “default” jump is not taken:
Figure 13.10: OllyDbg: 2 is no bigger than 4: no jump is taken
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
Here we see a jumptable:
Figure 13.11: OllyDbg: calculating destination address using jumptable
Here we’ve clicked “Follow in Dump”→ “Address constant”, so now we see the jumptable in the data window. These are 5 32-bit values4. ECX is now 2, so the second element (counting from zero) of the table is to be used. It’s also possible to click “Follow in Dump”→ “Memory address” and OllyDbg will show the element addressed by the JMP instruction. That’s 0x010B103A.
4They are underlined by OllyDbg because these are also FIXUPs:68.2.6 on page 673, we are going to come back to them later
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
After the jump we are at 0x010B103A: the code printing “two” will now be executed:
Figure 13.12: OllyDbg: now we at the case: label
Non-optimizing GCC
Let’s see what GCC 4.4.1 generates:
Listing 13.5: GCC 4.4.1 public f
f proc near ; CODE XREF: main+10 var_18 = dword ptr -18h
arg_0 = dword ptr 8 push ebp mov ebp, esp sub esp, 18h cmp [ebp+arg_0], 4 ja short loc_8048444 mov eax, [ebp+arg_0]
shl eax, 2
mov eax, ds:off_804855C[eax]
jmp eax
loc_80483FE: ; DATA XREF: .rodata:off_804855C
mov [esp+18h+var_18], offset aZero ; "zero"
call _puts
jmp short locret_8048450
loc_804840C: ; DATA XREF: .rodata:08048560
mov [esp+18h+var_18], offset aOne ; "one"
call _puts
jmp short locret_8048450
loc_804841A: ; DATA XREF: .rodata:08048564
mov [esp+18h+var_18], offset aTwo ; "two"
call _puts
jmp short locret_8048450
loc_8048428: ; DATA XREF: .rodata:08048568
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT mov [esp+18h+var_18], offset aThree ; "three"
call _puts
jmp short locret_8048450
loc_8048436: ; DATA XREF: .rodata:0804856C
mov [esp+18h+var_18], offset aFour ; "four"
call _puts
jmp short locret_8048450 loc_8048444: ; CODE XREF: f+A
mov [esp+18h+var_18], offset aSomethingUnkno ; "something unknown"
call _puts
locret_8048450: ; CODE XREF: f+26
; f+34...
leave retn
f endp
off_804855C dd offset loc_80483FE ; DATA XREF: f+12 dd offset loc_804840C
dd offset loc_804841A dd offset loc_8048428 dd offset loc_8048436
It is almost the same, with a little nuance: argument arg_0 is multiplied by 4 by shifting it to left by 2 bits (it is almost the same as multiplication by 4) (16.2.1 on page 204). Then the address of the label is taken from the off_804855C array, stored in EAX, and then JMP EAX does the actual jump.
13.2.2 ARM: Optimizing Keil 6/2013 (ARM mode)
Listing 13.6: Optimizing Keil 6/2013 (ARM mode)
00000174 f2
00000174 05 00 50 E3 CMP R0, #5 ; switch 5 cases 00000178 00 F1 8F 30 ADDCC PC, PC, R0,LSL#2 ; switch jump
0000017C 0E 00 00 EA B default_case ; jumptable 00000178 default case 00000180
00000180 loc_180 ; CODE XREF: f2+4
00000180 03 00 00 EA B zero_case ; jumptable 00000178 case 0 00000184
00000184 loc_184 ; CODE XREF: f2+4
00000184 04 00 00 EA B one_case ; jumptable 00000178 case 1 00000188
00000188 loc_188 ; CODE XREF: f2+4
00000188 05 00 00 EA B two_case ; jumptable 00000178 case 2 0000018C
0000018C loc_18C ; CODE XREF: f2+4
0000018C 06 00 00 EA B three_case ; jumptable 00000178 case 3 00000190
00000190 loc_190 ; CODE XREF: f2+4
00000190 07 00 00 EA B four_case ; jumptable 00000178 case 4 00000194
00000194 zero_case ; CODE XREF: f2+4
00000194 ; f2:loc_180
00000194 EC 00 8F E2 ADR R0, aZero ; jumptable 00000178 case 0 00000198 06 00 00 EA B loc_1B8
0000019C
0000019C one_case ; CODE XREF: f2+4
0000019C ; f2:loc_184
0000019C EC 00 8F E2 ADR R0, aOne ; jumptable 00000178 case 1
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT 000001A0 04 00 00 EA B loc_1B8
000001A4
000001A4 two_case ; CODE XREF: f2+4
000001A4 ; f2:loc_188
000001A4 01 0C 8F E2 ADR R0, aTwo ; jumptable 00000178 case 2 000001A8 02 00 00 EA B loc_1B8
000001AC
000001AC three_case ; CODE XREF: f2+4
000001AC ; f2:loc_18C
000001AC 01 0C 8F E2 ADR R0, aThree ; jumptable 00000178 case 3 000001B0 00 00 00 EA B loc_1B8
000001B4
000001B4 four_case ; CODE XREF: f2+4
000001B4 ; f2:loc_190
000001B4 01 0C 8F E2 ADR R0, aFour ; jumptable 00000178 case 4 000001B8
000001B8 loc_1B8 ; CODE XREF: f2+24
000001B8 ; f2+2C
000001B8 66 18 00 EA B __2printf 000001BC
000001BC default_case ; CODE XREF: f2+4
000001BC ; f2+8
000001BC D4 00 8F E2 ADR R0, aSomethingUnkno ; jumptable 00000178 default case 000001C0 FC FF FF EA B loc_1B8
This code makes use of the ARM mode feature in which all instructions have a fixed size of 4 bytes.
Let’s keep in mind that the maximum value for a is 4 and any greater value will cause «something unknown\n» string to be printed.
The first CMP R0, #5 instruction compares the input value of a with 5.
The next ADDCC PC, PC, R0,LSL#25instruction is being executed only if R0< 5 (CC=Carry clear / Less than). Conse-quently, if ADDCC does not trigger (it is a R0≥ 5 case), a jump to default_case label will occur.
But if R0< 5 and ADDCC triggers, the following is to be happen:
The value in R0 is multiplied by 4. In fact, LSL#2 at the instruction’s suffix stands for “shift left by 2 bits”. But as we will see later (16.2.1 on page 204) in section “Shifts”, shift left by 2 bits is equivalent to multiplying by 4.
Then we add R0∗ 4 to the current value inPC, thus jumping to one of the B (Branch) instructions located below.
At the moment of the execution of ADDCC, the value inPCis 8 bytes ahead (0x180) than the address at which the ADDCC instruction is located (0x178), or, in other words, 2 instructions ahead.
This is how the pipeline in ARM processors works: when ADDCC is executed, the processor at the moment is beginning to process the instruction after the next one, so that is whyPCpoints there. This has to be memorized.
If a = 0, then is to be added to the value inPC, and the actual value of thePCwill be written intoPC(which is 8 bytes ahead) and a jump to the label loc_180 will happen, which is 8 bytes ahead of the point where the ADDCC instruction is.
If a = 1, then P C + 8 + a∗ 4 = P C + 8 + 1 ∗ 4 = P C + 12 = 0x184 will be written toPC, which is the address of the loc_184 label.
With every 1 added to a, the resultingPCis increased by 4. 4 is the instruction length in ARM mode and also, the length of each B instruction, of which there are 5 in row.
Each of these five B instructions passes control further, to what was programmed in the switch(). Pointer loading of the corresponding string occurs there,etc.
13.2.3 ARM: Optimizing Keil 6/2013 (Thumb mode)
Listing 13.7: Optimizing Keil 6/2013 (Thumb mode)
000000F6 EXPORT f2
000000F6 f2
000000F6 10 B5 PUSH {R4,LR}
5ADD—addition
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
000000F8 03 00 MOVS R3, R0
000000FA 06 F0 69 F8 BL __ARM_common_switch8_thumb ; switch 6 cases
000000FE 05 DCB 5
000000FF 04 06 08 0A 0C 10 DCB 4, 6, 8, 0xA, 0xC, 0x10 ; jump table for switch statement
00000105 00 ALIGN 2
00000106
00000106 zero_case ; CODE XREF: f2+4
00000106 8D A0 ADR R0, aZero ; jumptable 000000FA case 0
00000108 06 E0 B loc_118
0000010A
0000010A one_case ; CODE XREF: f2+4
0000010A 8E A0 ADR R0, aOne ; jumptable 000000FA case 1
0000010C 04 E0 B loc_118
0000010E
0000010E two_case ; CODE XREF: f2+4
0000010E 8F A0 ADR R0, aTwo ; jumptable 000000FA case 2
00000110 02 E0 B loc_118
00000112
00000112 three_case ; CODE XREF: f2+4
00000112 90 A0 ADR R0, aThree ; jumptable 000000FA case 3
00000114 00 E0 B loc_118
00000116
00000116 four_case ; CODE XREF: f2+4
00000116 91 A0 ADR R0, aFour ; jumptable 000000FA case 4 00000118
00000118 loc_118 ; CODE XREF: f2+12
00000118 ; f2+16
00000118 06 F0 6A F8 BL __2printf
0000011C 10 BD POP {R4,PC}
0000011E
0000011E default_case ; CODE XREF: f2+4
0000011E 82 A0 ADR R0, aSomethingUnkno ; jumptable 000000FA default case
00000120 FA E7 B loc_118
000061D0 EXPORT __ARM_common_switch8_thumb
000061D0 __ARM_common_switch8_thumb ; CODE XREF: example6_f2+4
000061D0 78 47 BX PC
000061D2 00 00 ALIGN 4
000061D2 ; End of function __ARM_common_switch8_thumb 000061D2
000061D4 __32__ARM_common_switch8_thumb ; CODE XREF: ⤦ Ç __ARM_common_switch8_thumb
000061D4 01 C0 5E E5 LDRB R12, [LR,#-1]
000061D8 0C 00 53 E1 CMP R3, R12 000061DC 0C 30 DE 27 LDRCSB R3, [LR,R12]
000061E0 03 30 DE 37 LDRCCB R3, [LR,R3]
000061E4 83 C0 8E E0 ADD R12, LR, R3,LSL#1
000061E8 1C FF 2F E1 BX R12
000061E8 ; End of function __32__ARM_common_switch8_thumb
One cannot be sure that all instructions in Thumb and Thumb-2 modes has the same size. It can even be said that in these modes the instructions have variable lengths, just like in x86.
So there is a special table added that contains information about how much cases are there (not including default-case), and an offset for each with a label to which control must be passed in the corresponding case.
A special function is present here in order to deal with the table and pass control, named __ARM_common_switch8_thumb. It starts with BX PC , whose function is to switch the processor to ARM-mode. Then you see the function for table processing.
It is too complex to describe it here now, so let’s omit it.
It is interesting to note that the function uses theLRregister as a pointer to the table. Indeed, after calling of this function, LRcontains the address after BL __ARM_common_switch8_thumb instruction, where the table starts.
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
It is also worth noting that the code is generated as a separate function in order to reuse it, so the compiler not generates the same code for every switch() statement.
IDAsuccessfully perceived it as a service function and a table, and added comments to the labels like jumptable 000000FA case 0.
13.2.4 MIPS
Listing 13.8: Optimizing GCC 4.4.5 (IDA) f:
lui $gp, (__gnu_local_gp >> 16)
; jump to loc_24 if input value is lesser than 5:
sltiu $v0, $a0, 5 bnez $v0, loc_24
la $gp, (__gnu_local_gp & 0xFFFF) ; branch delay slot
; input value is greater or equal to 5.
; print "something unknown" and finish:
lui $a0, ($LC5 >> 16) # "something unknown"
lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP
jr $t9
la $a0, ($LC5 & 0xFFFF) # "something unknown" ; branch delay slot
loc_24: # CODE XREF: f+8
; load address of jumptable
; LA is pseudoinstruction, LUI and ADDIU pair are there in fact:
la $v0, off_120
; multiply input value by 4:
sll $a0, 2
; sum up multiplied value and jumptable address:
addu $a0, $v0, $a0
; load element from jumptable:
lw $v0, 0($a0) or $at, $zero ; NOP
; jump to the address we got in jumptable:
jr $v0
or $at, $zero ; branch delay slot, NOP
sub_44: # DATA XREF: .rodata:0000012C
; print "three" and finish
lui $a0, ($LC3 >> 16) # "three"
lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP
jr $t9
la $a0, ($LC3 & 0xFFFF) # "three" ; branch delay slot
sub_58: # DATA XREF: .rodata:00000130
; print "four" and finish
lui $a0, ($LC4 >> 16) # "four"
lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP
jr $t9
la $a0, ($LC4 & 0xFFFF) # "four" ; branch delay slot
sub_6C: # DATA XREF: .rodata:off_120
; print "zero" and finish
lui $a0, ($LC0 >> 16) # "zero"
lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP
jr $t9
la $a0, ($LC0 & 0xFFFF) # "zero" ; branch delay slot
sub_80: # DATA XREF: .rodata:00000124
; print "one" and finish
lui $a0, ($LC1 >> 16) # "one"
lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP
jr $t9
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT la $a0, ($LC1 & 0xFFFF) # "one" ; branch delay slot
sub_94: # DATA XREF: .rodata:00000128
; print "two" and finish
lui $a0, ($LC2 >> 16) # "two"
lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP
jr $t9
la $a0, ($LC2 & 0xFFFF) # "two" ; branch delay slot
; may be placed in .rodata section:
off_120: .word sub_6C .word sub_80 .word sub_94 .word sub_44 .word sub_58
The new instruction for us is SLTIU (“Set on Less Than Immediate Unsigned”). This is the same as SLTU (“Set on Less Than Unsigned”), but “I” stands for “immediate”, i.e., a number has to be specified in the instruction itself.
BNEZ is “Branch if Not Equal to Zero”.
Code is very close to the otherISAs. SLL (“Shift Word Left Logical”) does multiplication by 4. MIPS is a 32-bit CPU after all, so all addresses in the jumptable are 32-bit ones.
13.2.5 Conclusion
Rough skeleton of switch():
Listing 13.9: x86 MOV REG, input
CMP REG, 4 ; maximal number of cases JA default
SHL REG, 2 ; find element in table. shift for 3 bits in x64.
MOV REG, jump_table[REG]
JMP REG case1:
; do something JMP exit case2:
; do something JMP exit case3:
; do something JMP exit case4:
; do something JMP exit case5:
; do something JMP exit default:
...
exit:
....
jump_table dd case1 dd case2 dd case3 dd case4 dd case5
The jump to the address in the jump table may also be implemented using this instruction: JMP jump_table[REG*4].
Or JMP jump_table[REG*8] in x64.
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
A jumptable is just array of pointers, like the one described later:18.5 on page 271.