• Tidak ada hasil yang ditemukan

A lot of cases

Dalam dokumen Reverse Engineering for Beginners (Halaman 177-189)

switch()/case/default

13.2 A lot of cases

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT or $at, $zero ; load delay slot, NOP

jr $t9

la $a0, ($LC3 & 0xFFFF) # "something unknown" ; branch delay slot

#

---loc_4C: # CODE XREF: f+14

lui $a0, ($LC2 >> 16) # "two"

lw $t9, (puts & 0xFFFF)($gp)

or $at, $zero ; load delay slot, NOP

jr $t9

la $a0, ($LC2 & 0xFFFF) # "two" ; branch delay slot

#

---loc_60: # CODE XREF: f+8

lui $a0, ($LC1 >> 16) # "one"

lw $t9, (puts & 0xFFFF)($gp)

or $at, $zero ; load delay slot, NOP

jr $t9

la $a0, ($LC1 & 0xFFFF) # "one" ; branch delay slot

The function always ends with calling puts(), so here we see a jump to puts() (JR: “Jump Register”) instead of “jump and link”. We talked about this earlier:13.1.1 on page 143.

We also often see NOP instructions after LW ones. This is “load delay slot”: another delay slot in MIPS. An instruction next to LW may execute at the moment while LW loads value from memory. However, the next instruction must not use the result of LW. Modern MIPS CPUs have a feature to wait if the next instruction uses result of LW, so this is somewhat outdated, but GCC still adds NOPs for older MIPS CPUs. In general, it can be ignored.

13.1.7 Conclusion

A switch() with few cases is indistinguishable from an if/else construction, for example: listing.13.1.1.

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

Listing 13.4: MSVC 2010 tv64 = -4 ; size = 4

_a$ = 8 ; size = 4 _f PROC

push ebp mov ebp, esp push ecx

mov eax, DWORD PTR _a$[ebp]

mov DWORD PTR tv64[ebp], eax cmp DWORD PTR tv64[ebp], 4 ja SHORT $LN1@f

mov ecx, DWORD PTR tv64[ebp]

jmp DWORD PTR $LN11@f[ecx*4]

$LN6@f:

push OFFSET $SG739 ; 'zero', 0aH, 00H call _printf

add esp, 4 jmp SHORT $LN9@f

$LN5@f:

push OFFSET $SG741 ; 'one', 0aH, 00H call _printf

add esp, 4 jmp SHORT $LN9@f

$LN4@f:

push OFFSET $SG743 ; 'two', 0aH, 00H call _printf

add esp, 4 jmp SHORT $LN9@f

$LN3@f:

push OFFSET $SG745 ; 'three', 0aH, 00H call _printf

add esp, 4 jmp SHORT $LN9@f

$LN2@f:

push OFFSET $SG747 ; 'four', 0aH, 00H call _printf

add esp, 4 jmp SHORT $LN9@f

$LN1@f:

push OFFSET $SG749 ; 'something unknown', 0aH, 00H call _printf

add esp, 4

$LN9@f:

mov esp, ebp pop ebp

ret 0

npad 2 ; align next label

$LN11@f:

DD $LN6@f ; 0 DD $LN5@f ; 1 DD $LN4@f ; 2 DD $LN3@f ; 3 DD $LN2@f ; 4 _f ENDP

What we see here is a set of printf() calls with various arguments. All they have not only addresses in the memory of the process, but also internal symbolic labels assigned by the compiler. All these labels are also mentioned in the $LN11@f internal table.

At the function start, if a is greater than 4, control flow is passed to label $LN1@f, where printf() with argument 'some-thing unknown' is called.

But if the value of a is less or equals to 4, then it gets multiplied by 4 and added with the $LN11@f table address. That is how an address inside the table is constructed, pointing exactly to the element we need. For example, let’s say a is equal to 2. 2∗ 4 = 8 (all table elements are addresses in a 32-bit process and that is why all elements are 4 bytes wide). The address of the $LN11@f table + 8 is the table element where the $LN4@f label is stored. JMP fetches the $LN4@f address from the table and jumps to it.

This table is sometimes called jumptable or branch table3.

3The whole method was once called computed GOTO in early versions of FORTRAN:wikipedia. Not quite relevant these days, but what a term!

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

Then the corresponding printf() is called with argument 'two'. Literally, the jmp DWORD PTR $LN11@f[ecx*4]

instruction implies jump to the DWORD that is stored at address $LN11@f + ecx * 4.

npad (88 on page 854) is assembly language macro that aligning the next label so that it is to be stored at an address aligned on a 4 byte (or 16 byte) boundary. This is very suitable for the processor since it is able to fetch 32-bit values from memory through the memory bus, cache memory, etc, in a more effective way if it is aligned.

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

OllyDbg

Let’s try this example in OllyDbg. The input value of the function (2) is loaded into EAX:

Figure 13.9: OllyDbg: function’s input value is loaded in EAX

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

The input value is checked, is it bigger than 4? If not, the “default” jump is not taken:

Figure 13.10: OllyDbg: 2 is no bigger than 4: no jump is taken

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

Here we see a jumptable:

Figure 13.11: OllyDbg: calculating destination address using jumptable

Here we’ve clicked “Follow in Dump”→ “Address constant”, so now we see the jumptable in the data window. These are 5 32-bit values4. ECX is now 2, so the second element (counting from zero) of the table is to be used. It’s also possible to click “Follow in Dump”→ “Memory address” and OllyDbg will show the element addressed by the JMP instruction. That’s 0x010B103A.

4They are underlined by OllyDbg because these are also FIXUPs:68.2.6 on page 673, we are going to come back to them later

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

After the jump we are at 0x010B103A: the code printing “two” will now be executed:

Figure 13.12: OllyDbg: now we at the case: label

Non-optimizing GCC

Let’s see what GCC 4.4.1 generates:

Listing 13.5: GCC 4.4.1 public f

f proc near ; CODE XREF: main+10 var_18 = dword ptr -18h

arg_0 = dword ptr 8 push ebp mov ebp, esp sub esp, 18h cmp [ebp+arg_0], 4 ja short loc_8048444 mov eax, [ebp+arg_0]

shl eax, 2

mov eax, ds:off_804855C[eax]

jmp eax

loc_80483FE: ; DATA XREF: .rodata:off_804855C

mov [esp+18h+var_18], offset aZero ; "zero"

call _puts

jmp short locret_8048450

loc_804840C: ; DATA XREF: .rodata:08048560

mov [esp+18h+var_18], offset aOne ; "one"

call _puts

jmp short locret_8048450

loc_804841A: ; DATA XREF: .rodata:08048564

mov [esp+18h+var_18], offset aTwo ; "two"

call _puts

jmp short locret_8048450

loc_8048428: ; DATA XREF: .rodata:08048568

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT mov [esp+18h+var_18], offset aThree ; "three"

call _puts

jmp short locret_8048450

loc_8048436: ; DATA XREF: .rodata:0804856C

mov [esp+18h+var_18], offset aFour ; "four"

call _puts

jmp short locret_8048450 loc_8048444: ; CODE XREF: f+A

mov [esp+18h+var_18], offset aSomethingUnkno ; "something unknown"

call _puts

locret_8048450: ; CODE XREF: f+26

; f+34...

leave retn

f endp

off_804855C dd offset loc_80483FE ; DATA XREF: f+12 dd offset loc_804840C

dd offset loc_804841A dd offset loc_8048428 dd offset loc_8048436

It is almost the same, with a little nuance: argument arg_0 is multiplied by 4 by shifting it to left by 2 bits (it is almost the same as multiplication by 4) (16.2.1 on page 204). Then the address of the label is taken from the off_804855C array, stored in EAX, and then JMP EAX does the actual jump.

13.2.2 ARM: Optimizing Keil 6/2013 (ARM mode)

Listing 13.6: Optimizing Keil 6/2013 (ARM mode)

00000174 f2

00000174 05 00 50 E3 CMP R0, #5 ; switch 5 cases 00000178 00 F1 8F 30 ADDCC PC, PC, R0,LSL#2 ; switch jump

0000017C 0E 00 00 EA B default_case ; jumptable 00000178 default case 00000180

00000180 loc_180 ; CODE XREF: f2+4

00000180 03 00 00 EA B zero_case ; jumptable 00000178 case 0 00000184

00000184 loc_184 ; CODE XREF: f2+4

00000184 04 00 00 EA B one_case ; jumptable 00000178 case 1 00000188

00000188 loc_188 ; CODE XREF: f2+4

00000188 05 00 00 EA B two_case ; jumptable 00000178 case 2 0000018C

0000018C loc_18C ; CODE XREF: f2+4

0000018C 06 00 00 EA B three_case ; jumptable 00000178 case 3 00000190

00000190 loc_190 ; CODE XREF: f2+4

00000190 07 00 00 EA B four_case ; jumptable 00000178 case 4 00000194

00000194 zero_case ; CODE XREF: f2+4

00000194 ; f2:loc_180

00000194 EC 00 8F E2 ADR R0, aZero ; jumptable 00000178 case 0 00000198 06 00 00 EA B loc_1B8

0000019C

0000019C one_case ; CODE XREF: f2+4

0000019C ; f2:loc_184

0000019C EC 00 8F E2 ADR R0, aOne ; jumptable 00000178 case 1

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT 000001A0 04 00 00 EA B loc_1B8

000001A4

000001A4 two_case ; CODE XREF: f2+4

000001A4 ; f2:loc_188

000001A4 01 0C 8F E2 ADR R0, aTwo ; jumptable 00000178 case 2 000001A8 02 00 00 EA B loc_1B8

000001AC

000001AC three_case ; CODE XREF: f2+4

000001AC ; f2:loc_18C

000001AC 01 0C 8F E2 ADR R0, aThree ; jumptable 00000178 case 3 000001B0 00 00 00 EA B loc_1B8

000001B4

000001B4 four_case ; CODE XREF: f2+4

000001B4 ; f2:loc_190

000001B4 01 0C 8F E2 ADR R0, aFour ; jumptable 00000178 case 4 000001B8

000001B8 loc_1B8 ; CODE XREF: f2+24

000001B8 ; f2+2C

000001B8 66 18 00 EA B __2printf 000001BC

000001BC default_case ; CODE XREF: f2+4

000001BC ; f2+8

000001BC D4 00 8F E2 ADR R0, aSomethingUnkno ; jumptable 00000178 default case 000001C0 FC FF FF EA B loc_1B8

This code makes use of the ARM mode feature in which all instructions have a fixed size of 4 bytes.

Let’s keep in mind that the maximum value for a is 4 and any greater value will cause «something unknown\n» string to be printed.

The first CMP R0, #5 instruction compares the input value of a with 5.

The next ADDCC PC, PC, R0,LSL#25instruction is being executed only if R0< 5 (CC=Carry clear / Less than). Conse-quently, if ADDCC does not trigger (it is a R0≥ 5 case), a jump to default_case label will occur.

But if R0< 5 and ADDCC triggers, the following is to be happen:

The value in R0 is multiplied by 4. In fact, LSL#2 at the instruction’s suffix stands for “shift left by 2 bits”. But as we will see later (16.2.1 on page 204) in section “Shifts”, shift left by 2 bits is equivalent to multiplying by 4.

Then we add R0∗ 4 to the current value inPC, thus jumping to one of the B (Branch) instructions located below.

At the moment of the execution of ADDCC, the value inPCis 8 bytes ahead (0x180) than the address at which the ADDCC instruction is located (0x178), or, in other words, 2 instructions ahead.

This is how the pipeline in ARM processors works: when ADDCC is executed, the processor at the moment is beginning to process the instruction after the next one, so that is whyPCpoints there. This has to be memorized.

If a = 0, then is to be added to the value inPC, and the actual value of thePCwill be written intoPC(which is 8 bytes ahead) and a jump to the label loc_180 will happen, which is 8 bytes ahead of the point where the ADDCC instruction is.

If a = 1, then P C + 8 + a∗ 4 = P C + 8 + 1 ∗ 4 = P C + 12 = 0x184 will be written toPC, which is the address of the loc_184 label.

With every 1 added to a, the resultingPCis increased by 4. 4 is the instruction length in ARM mode and also, the length of each B instruction, of which there are 5 in row.

Each of these five B instructions passes control further, to what was programmed in the switch(). Pointer loading of the corresponding string occurs there,etc.

13.2.3 ARM: Optimizing Keil 6/2013 (Thumb mode)

Listing 13.7: Optimizing Keil 6/2013 (Thumb mode)

000000F6 EXPORT f2

000000F6 f2

000000F6 10 B5 PUSH {R4,LR}

5ADD—addition

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

000000F8 03 00 MOVS R3, R0

000000FA 06 F0 69 F8 BL __ARM_common_switch8_thumb ; switch 6 cases

000000FE 05 DCB 5

000000FF 04 06 08 0A 0C 10 DCB 4, 6, 8, 0xA, 0xC, 0x10 ; jump table for switch statement

00000105 00 ALIGN 2

00000106

00000106 zero_case ; CODE XREF: f2+4

00000106 8D A0 ADR R0, aZero ; jumptable 000000FA case 0

00000108 06 E0 B loc_118

0000010A

0000010A one_case ; CODE XREF: f2+4

0000010A 8E A0 ADR R0, aOne ; jumptable 000000FA case 1

0000010C 04 E0 B loc_118

0000010E

0000010E two_case ; CODE XREF: f2+4

0000010E 8F A0 ADR R0, aTwo ; jumptable 000000FA case 2

00000110 02 E0 B loc_118

00000112

00000112 three_case ; CODE XREF: f2+4

00000112 90 A0 ADR R0, aThree ; jumptable 000000FA case 3

00000114 00 E0 B loc_118

00000116

00000116 four_case ; CODE XREF: f2+4

00000116 91 A0 ADR R0, aFour ; jumptable 000000FA case 4 00000118

00000118 loc_118 ; CODE XREF: f2+12

00000118 ; f2+16

00000118 06 F0 6A F8 BL __2printf

0000011C 10 BD POP {R4,PC}

0000011E

0000011E default_case ; CODE XREF: f2+4

0000011E 82 A0 ADR R0, aSomethingUnkno ; jumptable 000000FA default case

00000120 FA E7 B loc_118

000061D0 EXPORT __ARM_common_switch8_thumb

000061D0 __ARM_common_switch8_thumb ; CODE XREF: example6_f2+4

000061D0 78 47 BX PC

000061D2 00 00 ALIGN 4

000061D2 ; End of function __ARM_common_switch8_thumb 000061D2

000061D4 __32__ARM_common_switch8_thumb ; CODE XREF: ⤦ Ç __ARM_common_switch8_thumb

000061D4 01 C0 5E E5 LDRB R12, [LR,#-1]

000061D8 0C 00 53 E1 CMP R3, R12 000061DC 0C 30 DE 27 LDRCSB R3, [LR,R12]

000061E0 03 30 DE 37 LDRCCB R3, [LR,R3]

000061E4 83 C0 8E E0 ADD R12, LR, R3,LSL#1

000061E8 1C FF 2F E1 BX R12

000061E8 ; End of function __32__ARM_common_switch8_thumb

One cannot be sure that all instructions in Thumb and Thumb-2 modes has the same size. It can even be said that in these modes the instructions have variable lengths, just like in x86.

So there is a special table added that contains information about how much cases are there (not including default-case), and an offset for each with a label to which control must be passed in the corresponding case.

A special function is present here in order to deal with the table and pass control, named __ARM_common_switch8_thumb. It starts with BX PC , whose function is to switch the processor to ARM-mode. Then you see the function for table processing.

It is too complex to describe it here now, so let’s omit it.

It is interesting to note that the function uses theLRregister as a pointer to the table. Indeed, after calling of this function, LRcontains the address after BL __ARM_common_switch8_thumb instruction, where the table starts.

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

It is also worth noting that the code is generated as a separate function in order to reuse it, so the compiler not generates the same code for every switch() statement.

IDAsuccessfully perceived it as a service function and a table, and added comments to the labels like jumptable 000000FA case 0.

13.2.4 MIPS

Listing 13.8: Optimizing GCC 4.4.5 (IDA) f:

lui $gp, (__gnu_local_gp >> 16)

; jump to loc_24 if input value is lesser than 5:

sltiu $v0, $a0, 5 bnez $v0, loc_24

la $gp, (__gnu_local_gp & 0xFFFF) ; branch delay slot

; input value is greater or equal to 5.

; print "something unknown" and finish:

lui $a0, ($LC5 >> 16) # "something unknown"

lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP

jr $t9

la $a0, ($LC5 & 0xFFFF) # "something unknown" ; branch delay slot

loc_24: # CODE XREF: f+8

; load address of jumptable

; LA is pseudoinstruction, LUI and ADDIU pair are there in fact:

la $v0, off_120

; multiply input value by 4:

sll $a0, 2

; sum up multiplied value and jumptable address:

addu $a0, $v0, $a0

; load element from jumptable:

lw $v0, 0($a0) or $at, $zero ; NOP

; jump to the address we got in jumptable:

jr $v0

or $at, $zero ; branch delay slot, NOP

sub_44: # DATA XREF: .rodata:0000012C

; print "three" and finish

lui $a0, ($LC3 >> 16) # "three"

lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP

jr $t9

la $a0, ($LC3 & 0xFFFF) # "three" ; branch delay slot

sub_58: # DATA XREF: .rodata:00000130

; print "four" and finish

lui $a0, ($LC4 >> 16) # "four"

lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP

jr $t9

la $a0, ($LC4 & 0xFFFF) # "four" ; branch delay slot

sub_6C: # DATA XREF: .rodata:off_120

; print "zero" and finish

lui $a0, ($LC0 >> 16) # "zero"

lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP

jr $t9

la $a0, ($LC0 & 0xFFFF) # "zero" ; branch delay slot

sub_80: # DATA XREF: .rodata:00000124

; print "one" and finish

lui $a0, ($LC1 >> 16) # "one"

lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP

jr $t9

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT la $a0, ($LC1 & 0xFFFF) # "one" ; branch delay slot

sub_94: # DATA XREF: .rodata:00000128

; print "two" and finish

lui $a0, ($LC2 >> 16) # "two"

lw $t9, (puts & 0xFFFF)($gp) or $at, $zero ; NOP

jr $t9

la $a0, ($LC2 & 0xFFFF) # "two" ; branch delay slot

; may be placed in .rodata section:

off_120: .word sub_6C .word sub_80 .word sub_94 .word sub_44 .word sub_58

The new instruction for us is SLTIU (“Set on Less Than Immediate Unsigned”). This is the same as SLTU (“Set on Less Than Unsigned”), but “I” stands for “immediate”, i.e., a number has to be specified in the instruction itself.

BNEZ is “Branch if Not Equal to Zero”.

Code is very close to the otherISAs. SLL (“Shift Word Left Logical”) does multiplication by 4. MIPS is a 32-bit CPU after all, so all addresses in the jumptable are 32-bit ones.

13.2.5 Conclusion

Rough skeleton of switch():

Listing 13.9: x86 MOV REG, input

CMP REG, 4 ; maximal number of cases JA default

SHL REG, 2 ; find element in table. shift for 3 bits in x64.

MOV REG, jump_table[REG]

JMP REG case1:

; do something JMP exit case2:

; do something JMP exit case3:

; do something JMP exit case4:

; do something JMP exit case5:

; do something JMP exit default:

...

exit:

....

jump_table dd case1 dd case2 dd case3 dd case4 dd case5

The jump to the address in the jump table may also be implemented using this instruction: JMP jump_table[REG*4].

Or JMP jump_table[REG*8] in x64.

CHAPTER 13. SWITCH()/CASE/DEFAULT CHAPTER 13. SWITCH()/CASE/DEFAULT

A jumptable is just array of pointers, like the one described later:18.5 on page 271.

Dalam dokumen Reverse Engineering for Beginners (Halaman 177-189)

Dokumen terkait