switch()/case/default
13.1 Small number of cases
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
Chapter 13
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT call _printf
add esp, 4 jmp SHORT $LN7@f
$LN2@f:
push OFFSET $SG743 ; 'two', 0aH, 00H call _printf
add esp, 4 jmp SHORT $LN7@f
$LN1@f:
push OFFSET $SG745 ; 'something unknown', 0aH, 00H call _printf
add esp, 4
$LN7@f:
mov esp, ebp pop ebp
ret 0
_f ENDP
Our function with a few cases in switch() is in fact analogous to this construction:
void f (int a) {
if (a==0)
printf ("zero\n");
else if (a==1)
printf ("one\n");
else if (a==2)
printf ("two\n");
else
printf ("something unknown\n");
};
If we work with switch() with a few cases it is impossible to be sure if it was a real switch() in the source code, or just a pack of if() statements. This implies that switch() is like syntactic sugar for a large number of nested if()s.
There is nothing especially new to us in the generated code, with the exception of the compiler moving input variable a to a temporary local variable tv641.
If we compile this in GCC 4.4.1, we’ll get almost the same result, even with maximal optimization turned on (-O3 option).
Optimizing MSVC
Now let’s turn on optimization in MSVC (/Ox): cl 1.c /Fa1.asm /Ox Listing 13.2: MSVC _a$ = 8 ; size = 4
_f PROC
mov eax, DWORD PTR _a$[esp-4]
sub eax, 0 je SHORT $LN4@f sub eax, 1 je SHORT $LN3@f sub eax, 1 je SHORT $LN2@f
mov DWORD PTR _a$[esp-4], OFFSET $SG791 ; 'something unknown', 0aH, 00H jmp _printf
$LN2@f:
mov DWORD PTR _a$[esp-4], OFFSET $SG789 ; 'two', 0aH, 00H jmp _printf
$LN3@f:
mov DWORD PTR _a$[esp-4], OFFSET $SG787 ; 'one', 0aH, 00H jmp _printf
$LN4@f:
mov DWORD PTR _a$[esp-4], OFFSET $SG785 ; 'zero', 0aH, 00H jmp _printf
_f ENDP
1Local variables in stack are prefixed with tv—that’s how MSVC names internal variables for its needs
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
Here we can see some dirty hacks.
First: the value of a is placed in EAX and 0 is subtracted from it. Sounds absurd, but it is done to check if the value in EAX was 0. If yes, the ZF flag is to be set (e.g. subtracting from 0 is 0) and the first conditional jump JE (Jump if Equal or synonym JZ —Jump if Zero) is to be triggered and control flow is to be passed to the $LN4@f label, where the 'zero' message is being printed. If the first jump doesn’t get triggered, 1 is subtracted from the input value and if at some stage the result is 0, the corresponding jump is to be triggered.
And if no jump gets triggered at all, the control flow passes to printf() with string argument 'something unknown'.
Second: we see something unusual for us: a string pointer is placed into the a variable, and then printf() is called not via CALL, but via JMP. There is a simple explanation for that: thecallerpushes a value to the stack and calls our function via CALL. CALL itself pushes the return address (RA) to the stack and does an unconditional jump to our function address. Our function at any point of execution (since it do not contain any instruction that moves the stack pointer) has the following stack layout:
• ESP—points toRA
• ESP+4—points to the a variable
On the other side, when we need to call printf() here we need exactly the same stack layout, except for the first printf() argument, which needs to point to the string. And that is what our code does.
It replaces the function’s first argument with the address of the string and jumps to printf(), as if we didn’t call our function f(), but directly printf(). printf() prints a string tostdoutand then executes the RET instruction, which POPsRA from the stack and control flow is returned not to f() but rather to f()’scallee, bypassing the end of the f() function.
All this is possible because printf() is called right at the end of the f() function in all cases. In some way, it is similar to the longjmp()2function. And of course, it is all done for the sake of speed.
A similar case with the ARM compiler is described in “printf() with several arguments”section, here (6.2.1 on page 45).
2wikipedia
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
OllyDbg
Since this example is tricky, let’s trace it in OllyDbg.
OllyDbg can detect such switch() constructs, and it can add some useful comments. EAX is 2 in the beginning, that’s the function’s input value:
Figure 13.1: OllyDbg: EAX now contain the first (and only) function argument
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
0 is subtracted from 2 in EAX. Of course, EAX still contains 2. But the ZF flag is now 0, indicating that the resulting value is non-zero:
Figure 13.2: OllyDbg: SUB executed
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
DEC is executed and EAX now contains 1. But 1 is non-zero, so the ZF flag is still 0:
Figure 13.3: OllyDbg: first DEC executed
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
Next DEC is executed. EAX is finally 0 and the ZF flag gets set, because the result is zero:
Figure 13.4: OllyDbg: second DEC executed
OllyDbg shows that this jump is to be taken now.
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
A pointer to the string “two” is to be written into the stack now:
Figure 13.5: OllyDbg: pointer to the string is to be written at the place of the first argument
Please note: the current argument of the function is 2 and 2 is now in the stack at the address 0x001EF850.
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
MOV writes the pointer to the string at address 0x001EF850 (see the stack window). Then, jump happens. This is the first instruction of the printf() function in MSVCR100.DLL (This example was compiled with /MD switch):
Figure 13.6: OllyDbg: first instruction of printf() in MSVCR100.DLL
Now printf() treats the string at 0x00FF3010 as its only argument and prints the string.
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
This is the last instruction of printf():
Figure 13.7: OllyDbg: last instruction of printf() in MSVCR100.DLL
The string “two” was just printed to the console window.
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
Now let’s press F7 or F8 (step over) and return…not to f(), but rather to main():
Figure 13.8: OllyDbg: return to main()
Yes, the jump was direct, from the guts of printf() to main(). BecauseRAin the stack points not to some place in f(), but rather to main(). And CALL 0x00FF1000 was the actual instruction which called f().
13.1.2 ARM: Optimizing Keil 6/2013 (ARM mode)
.text:0000014C f1:
.text:0000014C 00 00 50 E3 CMP R0, #0
.text:00000150 13 0E 8F 02 ADREQ R0, aZero ; "zero\n"
.text:00000154 05 00 00 0A BEQ loc_170 .text:00000158 01 00 50 E3 CMP R0, #1
.text:0000015C 4B 0F 8F 02 ADREQ R0, aOne ; "one\n"
.text:00000160 02 00 00 0A BEQ loc_170 .text:00000164 02 00 50 E3 CMP R0, #2
.text:00000168 4A 0F 8F 12 ADRNE R0, aSomethingUnkno ; "something unknown\n"
.text:0000016C 4E 0F 8F 02 ADREQ R0, aTwo ; "two\n"
.text:00000170
.text:00000170 loc_170: ; CODE XREF: f1+8
.text:00000170 ; f1+14
.text:00000170 78 18 00 EA B __2printf
Again, by investigating this code we cannot say if it was a switch() in the original source code, or just a pack of if() statements.
Anyway, we see here predicated instructions again (like ADREQ (Equal)) which is triggered only in case R0 = 0, and then loads the address of the string «zero\n» into R0. The next instructionBEQ redirects control flow to loc_170, if R0 = 0.
An astute reader may ask, willBEQtrigger correctly since ADREQ before it has already filled the R0 register with another value? Yes, it will sinceBEQchecks the flags set by the CMP instruction, and ADREQ does not modify any flags at all.
The rest of the instructions are already familiar to us. There is only one call to printf(), at the end, and we have already examined this trick here (6.2.1 on page 45). In the end, there are three paths to printf().
The last instruction, CMP R0, #2 , is needed to check if a = 2. If it is not true, then ADRNE loads a pointer to the string
«something unknown \n» into R0, since a was already checked to be equal to 0 or 1, and we can sure that the a variable is not equal to these numbers at this point. And if R0 = 2, a pointer to the string «two\n» will be loaded by ADREQ into R0.
13.1.3 ARM: Optimizing Keil 6/2013 (Thumb mode)
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
.text:000000D4 f1:
.text:000000D4 10 B5 PUSH {R4,LR}
.text:000000D6 00 28 CMP R0, #0 .text:000000D8 05 D0 BEQ zero_case .text:000000DA 01 28 CMP R0, #1 .text:000000DC 05 D0 BEQ one_case .text:000000DE 02 28 CMP R0, #2 .text:000000E0 05 D0 BEQ two_case
.text:000000E2 91 A0 ADR R0, aSomethingUnkno ; "something unknown\n"
.text:000000E4 04 E0 B default_case
.text:000000E6 zero_case: ; CODE XREF: f1+4 .text:000000E6 95 A0 ADR R0, aZero ; "zero\n"
.text:000000E8 02 E0 B default_case
.text:000000EA one_case: ; CODE XREF: f1+8 .text:000000EA 96 A0 ADR R0, aOne ; "one\n"
.text:000000EC 00 E0 B default_case
.text:000000EE two_case: ; CODE XREF: f1+C .text:000000EE 97 A0 ADR R0, aTwo ; "two\n"
.text:000000F0 default_case ; CODE XREF: f1+10
.text:000000F0 ; f1+14
.text:000000F0 06 F0 7E F8 BL __2printf .text:000000F4 10 BD POP {R4,PC}
As was already mentioned, it is not possible to add conditional predicates to most instructions in Thumb mode, so the Thumb-code here is somewhat similar to the easily understandable x86CISC-style code.
13.1.4 ARM64: Non-optimizing GCC (Linaro) 4.9
.LC12:
.string "zero"
.LC13:
.string "one"
.LC14:
.string "two"
.LC15:
.string "something unknown"
f12:
stp x29, x30, [sp, -32]!
add x29, sp, 0 str w0, [x29,28]
ldr w0, [x29,28]
cmp w0, 1 beq .L34 cmp w0, 2 beq .L35 cmp w0, wzr
bne .L38 ; jump to default label adrp x0, .LC12 ; "zero"
add x0, x0, :lo12:.LC12
bl puts
b .L32
.L34:
adrp x0, .LC13 ; "one"
add x0, x0, :lo12:.LC13
bl puts
b .L32
.L35:
adrp x0, .LC14 ; "two"
add x0, x0, :lo12:.LC14
bl puts
b .L32
.L38:
adrp x0, .LC15 ; "something unknown"
add x0, x0, :lo12:.LC15
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT
bl puts
nop .L32:
ldp x29, x30, [sp], 32 ret
The type of the input value is int, hence register W0 is used to hold it instead of the whole X0 register. The string pointers are passed to puts() using an ADRP/ADD instructions pair just like it was demonstrated in the “Hello, world!” example:3.4.5 on page 16.
13.1.5 ARM64: Optimizing GCC (Linaro) 4.9
f12:
cmp w0, 1 beq .L31 cmp w0, 2 beq .L32 cbz w0, .L35
; default case
adrp x0, .LC15 ; "something unknown"
add x0, x0, :lo12:.LC15
b puts
.L35:
adrp x0, .LC12 ; "zero"
add x0, x0, :lo12:.LC12
b puts
.L32:
adrp x0, .LC14 ; "two"
add x0, x0, :lo12:.LC14
b puts
.L31:
adrp x0, .LC13 ; "one"
add x0, x0, :lo12:.LC13
b puts
Better optimized piece of code. CBZ (Compare and Branch on Zero) instruction does jump if W0 is zero. There is also a direct jump to puts() instead of calling it, like it was explained before:13.1.1 on page 143.
13.1.6 MIPS
Listing 13.3: Optimizing GCC 4.4.5 (IDA) f:
lui $gp, (__gnu_local_gp >> 16)
; is it 1?
li $v0, 1
beq $a0, $v0, loc_60
la $gp, (__gnu_local_gp & 0xFFFF) ; branch delay slot
; is it 2?
li $v0, 2
beq $a0, $v0, loc_4C
or $at, $zero ; branch delay slot, NOP
; jump, if not equal to 0:
bnez $a0, loc_38
or $at, $zero ; branch delay slot, NOP
; zero case:
lui $a0, ($LC0 >> 16) # "zero"
lw $t9, (puts & 0xFFFF)($gp)
or $at, $zero ; load delay slot, NOP jr $t9 ; branch delay slot, NOP
la $a0, ($LC0 & 0xFFFF) # "zero" ; branch delay slot
#
---loc_38: # CODE XREF: f+1C
lui $a0, ($LC3 >> 16) # "something unknown"
lw $t9, (puts & 0xFFFF)($gp)
CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT or $at, $zero ; load delay slot, NOP
jr $t9
la $a0, ($LC3 & 0xFFFF) # "something unknown" ; branch delay slot
#
---loc_4C: # CODE XREF: f+14
lui $a0, ($LC2 >> 16) # "two"
lw $t9, (puts & 0xFFFF)($gp)
or $at, $zero ; load delay slot, NOP
jr $t9
la $a0, ($LC2 & 0xFFFF) # "two" ; branch delay slot
#
---loc_60: # CODE XREF: f+8
lui $a0, ($LC1 >> 16) # "one"
lw $t9, (puts & 0xFFFF)($gp)
or $at, $zero ; load delay slot, NOP
jr $t9
la $a0, ($LC1 & 0xFFFF) # "one" ; branch delay slot
The function always ends with calling puts(), so here we see a jump to puts() (JR: “Jump Register”) instead of “jump and link”. We talked about this earlier:13.1.1 on page 143.
We also often see NOP instructions after LW ones. This is “load delay slot”: another delay slot in MIPS. An instruction next to LW may execute at the moment while LW loads value from memory. However, the next instruction must not use the result of LW. Modern MIPS CPUs have a feature to wait if the next instruction uses result of LW, so this is somewhat outdated, but GCC still adds NOPs for older MIPS CPUs. In general, it can be ignored.
13.1.7 Conclusion
A switch() with few cases is indistinguishable from an if/else construction, for example: listing.13.1.1.