Reverse Engineering for Beginners
Dennis Yurichev
Reverse Engineering for Beginners
Dennis Yurichev
<dennis(a)yurichev.com>
c b n d
©2013-2015, Dennis Yurichev.
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visithttp://creativecommons.org/licenses/by-nc-nd/3.0/.
Text version (
May 30, 2016
).The latest version (and Russian edition) of this text is accessible atbeginners.re. An e-book reader version is also available.
There is also a LITE-version (introductory short version), intended for those who want a very quick introduction to the basics of reverse engineering:beginners.re
You can also follow me on twitter to get information about updates of this text: @yurichev1, or subscribe to the mailing list2.
The cover was made by Andy Nechaevsky: facebook.
1twitter.com/yurichev
2yurichev.com
Call for translators!
You may want to help me with translation this work into languages other than English and Russian.
Just send me any piece of translated text (no matter how short) and I’ll put it into my LaTeX source code.
Speed isn’t important, because this is open-source project, after all. Your name will be mentioned as project contributor.
Korean, Chinese and Persian languages are reserved by publishers.
English and Russian versions I do by myself, but my English is still that horrible, so I’m very grateful for any notes about grammar, etc. Even my Russian is also flawed, so I’m grateful for notes about Russian text as well!
So do not hesitate to contact me: dennis(a)yurichev.com.
ABRIDGED CONTENTS ABRIDGED CONTENTS
Abridged contents
I Code patterns 1
II Important fundamentals 429
III Slightly more advanced examples 438
IV Java 585
V Finding important/interesting stuff in the code 623
VI OS-specific 647
VII Tools 701
VIII Examples of real-world RE 3
tasks 707
IX Examples of reversing proprietary file formats 822
X Other things 853
XI Books/blogs worth reading 871
Afterword 876
Appendix 878
Acronyms used 908
3Reverse Engineering
CONTENTS CONTENTS
Contents
I Code patterns 1
1 A short introduction to the CPU 3
1.1 A couple of words about different ISA4s . . . 3
2 The simplest Function 4 2.1 x86 . . . 4
2.2 ARM . . . 4
2.3 MIPS . . . 5
2.3.1 A note about MIPS instruction/register names . . . 5
3 Hello, world! 6 3.1 x86 . . . 6
3.1.1 MSVC . . . 6
3.1.2 GCC . . . 7
3.1.3 GCC: AT&T syntax . . . 8
3.2 x86-64 . . . 9
3.2.1 MSVC—x86-64 . . . 9
3.2.2 GCC—x86-64 . . . 10
3.3 GCC—one more thing . . . 11
3.4 ARM . . . 11
3.4.1 Non-optimizing Keil 6/2013 (ARM mode) . . . 12
3.4.2 Non-optimizing Keil 6/2013 (Thumb mode) . . . 13
3.4.3 Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 13
3.4.4 Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 14
3.4.5 ARM64 . . . 16
3.5 MIPS . . . 17
3.5.1 A word about the “global pointer” . . . 17
3.5.2 Optimizing GCC. . . 17
3.5.3 Non-optimizing GCC. . . 19
3.5.4 Role of the stack frame in this example. . . 20
3.5.5 Optimizing GCC: load it into GDB. . . 20
3.6 Conclusion. . . 21
3.7 Exercises . . . 21
4 Function prologue and epilogue 22 4.1 Recursion . . . 22
5 Stack 23 5.1 Why does the stack grow backwards?. . . 23
5.2 What is the stack used for? . . . 24
5.2.1 Save the function’s return address . . . 24
5.2.2 Passing function arguments . . . 25
5.2.3 Local variable storage. . . 26
5.2.4 x86: alloca() function . . . 26
5.2.5 (Windows) SEH . . . 28
5.2.6 Buffer overflow protection . . . 28
5.2.7 Automatic deallocation of data in stack . . . 28
5.3 A typical stack layout . . . 28
5.4 Noise in stack . . . 28
4Instruction Set Architecture
CONTENTS CONTENTS
5.4.1 MSVC 2013 . . . 32
5.5 Exercises . . . 33
6 printf() with several arguments 34 6.1 x86 . . . 34
6.1.1 x86: 3 arguments . . . 34
6.1.2 x64: 8 arguments . . . 42
6.2 ARM . . . 45
6.2.1 ARM: 3 arguments . . . 45
6.2.2 ARM: 8 arguments . . . 46
6.3 MIPS . . . 50
6.3.1 3 arguments. . . 50
6.3.2 8 arguments. . . 52
6.4 Conclusion. . . 56
6.5 By the way. . . 57
7 scanf() 58 7.1 Simple example . . . 58
7.1.1 About pointers . . . 58
7.1.2 x86 . . . 59
7.1.3 MSVC + OllyDbg . . . 61
7.1.4 x64 . . . 64
7.1.5 ARM . . . 65
7.1.6 MIPS. . . 66
7.2 Global variables . . . 67
7.2.1 MSVC: x86 . . . 67
7.2.2 MSVC: x86 + OllyDbg . . . 69
7.2.3 GCC: x86 . . . 70
7.2.4 MSVC: x64 . . . 70
7.2.5 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 71
7.2.6 ARM64 . . . 72
7.2.7 MIPS. . . 72
7.3 scanf() result checking . . . 76
7.3.1 MSVC: x86 . . . 76
7.3.2 MSVC: x86: IDA . . . 77
7.3.3 MSVC: x86 + OllyDbg . . . 81
7.3.4 MSVC: x86 + Hiew . . . 83
7.3.5 MSVC: x64 . . . 84
7.3.6 ARM . . . 85
7.3.7 MIPS. . . 86
7.3.8 Exercise. . . 87
7.4 Exercise . . . 87
8 Accessing passed arguments 88 8.1 x86 . . . 88
8.1.1 MSVC . . . 88
8.1.2 MSVC + OllyDbg . . . 89
8.1.3 GCC . . . 89
8.2 x64 . . . 90
8.2.1 MSVC . . . 90
8.2.2 GCC . . . 91
8.2.3 GCC: uint64_t instead of int . . . 92
8.3 ARM . . . 93
8.3.1 Non-optimizing Keil 6/2013 (ARM mode) . . . 93
8.3.2 Optimizing Keil 6/2013 (ARM mode) . . . 94
8.3.3 Optimizing Keil 6/2013 (Thumb mode) . . . 94
8.3.4 ARM64 . . . 94
8.4 MIPS . . . 96
9 More about results returning 97 9.1 Attempt to use the result of a function returning void . . . 97
9.2 What if we do not use the function result? . . . 98
9.3 Returning a structure . . . 98
CONTENTS CONTENTS
10 Pointers 100
10.1 Global variables example . . . 100
10.2 Local variables example . . . 106
10.3 Conclusion. . . 109
11 GOTO operator 110 11.1 Dead code . . . 112
11.2 Exercise . . . 113
12 Conditional jumps 114 12.1 Simple example . . . 114
12.1.1 x86 . . . 114
12.1.2 ARM . . . 125
12.1.3 MIPS. . . 128
12.2 Calculating absolute value. . . 131
12.2.1 Optimizing MSVC. . . 131
12.2.2 Optimizing Keil 6/2013: Thumb mode . . . 131
12.2.3 Optimizing Keil 6/2013: ARM mode . . . 131
12.2.4 Non-optimizing GCC 4.9 (ARM64) . . . 132
12.2.5 MIPS. . . 132
12.2.6 Branchless version? . . . 132
12.3 Ternary conditional operator . . . 132
12.3.1 x86 . . . 133
12.3.2 ARM . . . 134
12.3.3 ARM64 . . . 134
12.3.4 MIPS. . . 135
12.3.5 Let’s rewrite it in an if/else way . . . 135
12.3.6 Conclusion. . . 135
12.4 Getting minimal and maximal values . . . 136
12.4.1 32-bit . . . 136
12.4.2 64-bit . . . 138
12.4.3 MIPS. . . 140
12.5 Conclusion. . . 140
12.5.1 x86 . . . 140
12.5.2 ARM . . . 140
12.5.3 MIPS. . . 141
12.5.4 Branchless . . . 141
12.6 Exercise . . . 141
13 switch()/case/default 142 13.1 Small number of cases . . . 142
13.1.1 x86 . . . 142
13.1.2 ARM: Optimizing Keil 6/2013 (ARM mode). . . 152
13.1.3 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 152
13.1.4 ARM64: Non-optimizing GCC (Linaro) 4.9 . . . 153
13.1.5 ARM64: Optimizing GCC (Linaro) 4.9 . . . 154
13.1.6 MIPS. . . 154
13.1.7 Conclusion. . . 155
13.2 A lot of cases . . . 155
13.2.1 x86 . . . 155
13.2.2 ARM: Optimizing Keil 6/2013 (ARM mode). . . 162
13.2.3 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 163
13.2.4 MIPS. . . 165
13.2.5 Conclusion. . . 166
13.3 When there are several case statements in one block . . . 167
13.3.1 MSVC . . . 167
13.3.2 GCC . . . 168
13.3.3 ARM64: Optimizing GCC 4.9.1. . . 169
13.4 Fall-through. . . 170
13.4.1 MSVC x86 . . . 171
13.4.2 ARM64 . . . 172
13.5 Exercises . . . 172
13.5.1 Exercise #1 . . . 172
CONTENTS CONTENTS
14 Loops 173
14.1 Simple example . . . 173
14.1.1 x86 . . . 173
14.1.2 x86: OllyDbg . . . 177
14.1.3 x86: tracer . . . 177
14.1.4 ARM . . . 179
14.1.5 MIPS. . . 182
14.1.6 One more thing. . . 183
14.2 Memory blocks copying routine . . . 183
14.2.1 Straight-forward implementation . . . 183
14.2.2 ARM in ARM mode . . . 184
14.2.3 MIPS. . . 185
14.2.4 Vectorization . . . 185
14.3 Conclusion. . . 186
14.4 Exercises . . . 187
15 Simple C-strings processing 188 15.1 strlen() . . . 188
15.1.1 x86 . . . 188
15.1.2 ARM . . . 195
15.1.3 MIPS. . . 198
16 Replacing arithmetic instructions to other ones 199 16.1 Multiplication. . . 199
16.1.1 Multiplication using addition . . . 199
16.1.2 Multiplication using shifting. . . 199
16.1.3 Multiplication using shifting, subtracting, and adding . . . 200
16.2 Division . . . 204
16.2.1 Division using shifts . . . 204
16.3 Exercise . . . 204
17 Floating-point unit 205 17.1 IEEE 754 . . . 205
17.2 x86 . . . 205
17.3 ARM, MIPS, x86/x64 SIMD . . . 205
17.4 C/C++ . . . 205
17.5 Simple example . . . 206
17.5.1 x86 . . . 206
17.5.2 ARM: Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 213
17.5.3 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 214
17.5.4 ARM64: Optimizing GCC (Linaro) 4.9 . . . 214
17.5.5 ARM64: Non-optimizing GCC (Linaro) 4.9 . . . 215
17.5.6 MIPS. . . 216
17.6 Passing floating point numbers via arguments . . . 216
17.6.1 x86 . . . 217
17.6.2 ARM + Non-optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 217
17.6.3 ARM + Non-optimizing Keil 6/2013 (ARM mode) . . . 218
17.6.4 ARM64 + Optimizing GCC (Linaro) 4.9 . . . 218
17.6.5 MIPS. . . 219
17.7 Comparison example . . . 220
17.7.1 x86 . . . 220
17.7.2 ARM . . . 247
17.7.3 ARM64 . . . 250
17.7.4 MIPS. . . 252
17.8 Stack, calculators and reverse Polish notation . . . 252
17.9 x64 . . . 252
17.10Exercises . . . 252
18 Arrays 253 18.1 Simple example . . . 253
18.1.1 x86 . . . 253
18.1.2 ARM . . . 256
18.1.3 MIPS. . . 259
18.2 Buffer overflow . . . 260
18.2.1 Reading outside array bounds. . . 260
CONTENTS CONTENTS
18.2.2 Writing beyond array bounds . . . 263
18.3 Buffer overflow protection methods . . . 268
18.3.1 Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 269
18.4 One more word about arrays . . . 271
18.5 Array of pointers to strings . . . 271
18.5.1 x64 . . . 272
18.5.2 32-bit ARM. . . 273
18.5.3 ARM64 . . . 274
18.5.4 MIPS. . . 275
18.5.5 Array overflow . . . 275
18.6 Multidimensional arrays . . . 278
18.6.1 Two-dimensional array example . . . 278
18.6.2 Access two-dimensional array as one-dimensional . . . 279
18.6.3 Three-dimensional array example . . . 281
18.6.4 More examples . . . 284
18.7 Pack of strings as a two-dimensional array . . . 284
18.7.1 32-bit ARM. . . 286
18.7.2 ARM64 . . . 286
18.7.3 MIPS. . . 287
18.7.4 Conclusion. . . 287
18.8 Conclusion. . . 288
18.9 Exercises . . . 288
19 Manipulating specific bit(s) 289 19.1 Specific bit checking . . . 289
19.1.1 x86 . . . 289
19.1.2 ARM . . . 291
19.2 Setting and clearing specific bits. . . 292
19.2.1 x86 . . . 293
19.2.2 ARM + Optimizing Keil 6/2013 (ARM mode) . . . 298
19.2.3 ARM + Optimizing Keil 6/2013 (Thumb mode) . . . 299
19.2.4 ARM + Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 299
19.2.5 ARM: more about the BIC instruction. . . 299
19.2.6 ARM64: Optimizing GCC (Linaro) 4.9 . . . 299
19.2.7 ARM64: Non-optimizing GCC (Linaro) 4.9 . . . 300
19.2.8 MIPS. . . 300
19.3 Shifts . . . 300
19.4 Setting and clearing specific bits: FPU5example. . . 300
19.4.1 A word about the XOR operation . . . 301
19.4.2 x86 . . . 301
19.4.3 MIPS. . . 303
19.4.4 ARM . . . 303
19.5 Counting bits set to 1 . . . 305
19.5.1 x86 . . . 306
19.5.2 x64 . . . 314
19.5.3 ARM + Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 316
19.5.4 ARM + Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 317
19.5.5 ARM64 + Optimizing GCC 4.9 . . . 317
19.5.6 ARM64 + Non-optimizing GCC 4.9 . . . 317
19.5.7 MIPS. . . 318
19.6 Conclusion. . . 320
19.6.1 Check for specific bit (known at compile stage) . . . 320
19.6.2 Check for specific bit (specified at runtime) . . . 320
19.6.3 Set specific bit (known at compile stage) . . . 321
19.6.4 Set specific bit (specified at runtime) . . . 321
19.6.5 Clear specific bit (known at compile stage) . . . 321
19.6.6 Clear specific bit (specified at runtime). . . 322
19.7 Exercises . . . 322
20 Linear congruential generator 323 20.1 x86 . . . 323
20.2 x64 . . . 324
20.3 32-bit ARM . . . 325
5Floating-point unit
CONTENTS CONTENTS
20.4 MIPS . . . 325
20.4.1 MIPS relocations . . . 326
20.5 Thread-safe version of the example . . . 327
21 Structures 328 21.1 MSVC: SYSTEMTIME example . . . 328
21.1.1 OllyDbg. . . 330
21.1.2 Replacing the structure with array . . . 330
21.2 Let’s allocate space for a structure using malloc() . . . 331
21.3 UNIX: struct tm . . . 333
21.3.1 Linux . . . 333
21.3.2 ARM . . . 336
21.3.3 MIPS. . . 337
21.3.4 Structure as a set of values . . . 339
21.3.5 Structure as an array of 32-bit words . . . 340
21.3.6 Structure as an array of bytes . . . 341
21.4 Fields packing in structure. . . 343
21.4.1 x86 . . . 343
21.4.2 ARM . . . 347
21.4.3 MIPS. . . 348
21.4.4 One more word . . . 349
21.5 Nested structures . . . 349
21.5.1 OllyDbg. . . 351
21.6 Bit fields in a structure . . . 351
21.6.1 CPUID example . . . 351
21.6.2 Working with the float type as with a structure . . . 355
21.7 Exercises . . . 358
22 Unions 359 22.1 Pseudo-random number generator example. . . 359
22.1.1 x86 . . . 360
22.1.2 MIPS. . . 361
22.1.3 ARM (ARM mode). . . 362
22.2 Calculating machine epsilon . . . 363
22.2.1 x86 . . . 364
22.2.2 ARM64 . . . 364
22.2.3 MIPS. . . 365
22.2.4 Conclusion. . . 365
22.3 Fast square root calculation. . . 365
23 Pointers to functions 367 23.1 MSVC . . . 368
23.1.1 MSVC + OllyDbg . . . 370
23.1.2 MSVC + tracer . . . 372
23.1.3 MSVC + tracer (code coverage) . . . 374
23.2 GCC . . . 374
23.2.1 GCC + GDB (with source code). . . 375
23.2.2 GCC + GDB (no source code) . . . 376
24 64-bit values in 32-bit environment 379 24.1 Returning of 64-bit value . . . 379
24.1.1 x86 . . . 379
24.1.2 ARM . . . 379
24.1.3 MIPS. . . 380
24.2 Arguments passing, addition, subtraction . . . 380
24.2.1 x86 . . . 380
24.2.2 ARM . . . 381
24.2.3 MIPS. . . 382
24.3 Multiplication, division . . . 383
24.3.1 x86 . . . 383
24.3.2 ARM . . . 385
24.3.3 MIPS. . . 386
24.4 Shifting right . . . 387
24.4.1 x86 . . . 387
24.4.2 ARM . . . 387
CONTENTS CONTENTS
24.4.3 MIPS. . . 388
24.5 Converting 32-bit value into 64-bit one . . . 388
24.5.1 x86 . . . 388
24.5.2 ARM . . . 388
24.5.3 MIPS. . . 389
25 SIMD 390 25.1 Vectorization . . . 390
25.1.1 Addition example . . . 391
25.1.2 Memory copy example . . . 396
25.2 SIMD strlen() implementation . . . 400
26 64 bits 403 26.1 x86-64 . . . 403
26.2 ARM . . . 409
26.3 Float point numbers. . . 410
27 Working with floating point numbers using SIMD 411 27.1 Simple example . . . 411
27.1.1 x64 . . . 411
27.1.2 x86 . . . 412
27.2 Passing floating point number via arguments. . . 419
27.3 Comparison example . . . 420
27.3.1 x64 . . . 420
27.3.2 x86 . . . 421
27.4 Calculating machine epsilon: x64 and SIMD . . . 421
27.5 Pseudo-random number generator example revisited. . . 422
27.6 Summary. . . 422
28 ARM-specific details 424 28.1 Number sign (#) before number . . . 424
28.2 Addressing modes . . . 424
28.3 Loading a constant into a register . . . 425
28.3.1 32-bit ARM. . . 425
28.3.2 ARM64 . . . 425
28.4 Relocs in ARM64 . . . 426
29 MIPS-specific details 428 29.1 Loading constants into register . . . 428
29.2 Further reading about MIPS . . . 428
II Important fundamentals 429
30 Signed number representations 431 31 Endianness 433 31.1 Big-endian. . . 43331.2 Little-endian . . . 433
31.3 Example . . . 433
31.4 Bi-endian . . . 434
31.5 Converting data . . . 434
32 Memory 435 33 CPU 436 33.1 Branch predictors . . . 436
33.2 Data dependencies . . . 436
34 Hash functions 437 34.1 How do one-way functions work? . . . 437
III Slightly more advanced examples 438
35 Temperature converting 439
CONTENTS CONTENTS
35.1 Integer values. . . 439
35.1.1 Optimizing MSVC 2012 x86 . . . 439
35.1.2 Optimizing MSVC 2012 x64 . . . 441
35.2 Floating-point values . . . 441
36 Fibonacci numbers 444 36.1 Example #1 . . . 444
36.2 Example #2 . . . 447
36.3 Summary. . . 450
37 CRC32 calculation example 451 38 Network address calculation example 454 38.1 calc_network_address() . . . 455
38.2 form_IP() . . . 456
38.3 print_as_IP() . . . 457
38.4 form_netmask() and set_bit() . . . 458
38.5 Summary. . . 459
39 Loops: several iterators 460 39.1 Three iterators . . . 460
39.2 Two iterators . . . 461
39.3 Intel C++ 2011 case . . . 462
40 Duff’s device 465 41 Division by 9 468 41.1 x86 . . . 468
41.2 ARM . . . 469
41.2.1 Optimizing Xcode 4.6.3 (LLVM) (ARM mode) . . . 469
41.2.2 Optimizing Xcode 4.6.3 (LLVM) (Thumb-2 mode) . . . 470
41.2.3 Non-optimizing Xcode 4.6.3 (LLVM) and Keil 6/2013 . . . 470
41.3 MIPS . . . 470
41.4 How it works . . . 471
41.4.1 More theory . . . 472
41.5 Getting the divisor. . . 472
41.5.1 Variant #1 . . . 472
41.5.2 Variant #2 . . . 473
41.6 Exercise . . . 473
42 String to number conversion (atoi()) 474 42.1 Simple example . . . 474
42.1.1 Optimizing MSVC 2013 x64 . . . 474
42.1.2 Optimizing GCC 4.9.1 x64 . . . 475
42.1.3 Optimizing Keil 6/2013 (ARM mode) . . . 475
42.1.4 Optimizing Keil 6/2013 (Thumb mode) . . . 476
42.1.5 Optimizing GCC 4.9.1 ARM64 . . . 476
42.2 A slightly advanced example . . . 477
42.2.1 Optimizing GCC 4.9.1 x64 . . . 478
42.2.2 Optimizing Keil 6/2013 (ARM mode) . . . 479
42.3 Exercise . . . 480
43 Inline functions 481 43.1 Strings and memory functions . . . 482
43.1.1 strcmp(). . . 482
43.1.2 strlen() . . . 484
43.1.3 strcpy() . . . 484
43.1.4 memset() . . . 484
43.1.5 memcpy(). . . 486
43.1.6 memcmp() . . . 488
43.1.7 IDA script. . . 489
44 C99 restrict 490 45 Branchless abs() function 493 45.1 Optimizing GCC 4.9.1 x64 . . . 493
CONTENTS CONTENTS
45.2 Optimizing GCC 4.9 ARM64 . . . 494
46 Variadic functions 495 46.1 Computing arithmetic mean. . . 495
46.1.1 cdecl calling conventions. . . 495
46.1.2 Register-based calling conventions . . . 496
46.2 vprintf() function case . . . 498
47 Strings trimming 500 47.1 x64: Optimizing MSVC 2013 . . . 501
47.2 x64: Non-optimizing GCC 4.9.1 . . . 502
47.3 x64: Optimizing GCC 4.9.1 . . . 503
47.4 ARM64: Non-optimizing GCC (Linaro) 4.9 . . . 504
47.5 ARM64: Optimizing GCC (Linaro) 4.9 . . . 505
47.6 ARM: Optimizing Keil 6/2013 (ARM mode). . . 506
47.7 ARM: Optimizing Keil 6/2013 (Thumb mode) . . . 506
47.8 MIPS . . . 507
48 toupper() function 509 48.1 x64 . . . 509
48.1.1 Two comparison operations . . . 509
48.1.2 One comparison operation. . . 510
48.2 ARM . . . 511
48.2.1 GCC for ARM64 . . . 511
48.3 Summary. . . 512
49 Incorrectly disassembled code 513 49.1 Disassembling from an incorrect start (x86) . . . 513
49.2 How does random noise looks disassembled? . . . 514
50 Obfuscation 518 50.1 Text strings . . . 518
50.2 Executable code . . . 519
50.2.1 Inserting garbage . . . 519
50.2.2 Replacing instructions with bloated equivalents . . . 519
50.2.3 Always executed/never executed code . . . 519
50.2.4 Making a lot of mess . . . 519
50.2.5 Using indirect pointers . . . 520
50.3 Virtual machine / pseudo-code . . . 520
50.4 Other things to mention . . . 520
50.5 Exercise . . . 520
51 C++ 521 51.1 Classes . . . 521
51.1.1 A simple example . . . 521
51.1.2 Class inheritance . . . 527
51.1.3 Encapsulation. . . 530
51.1.4 Multiple inheritance. . . 531
51.1.5 Virtual methods . . . 534
51.2 ostream . . . 537
51.3 References . . . 538
51.4 STL . . . 538
51.4.1 std::string. . . 538
51.4.2 std::list . . . 545
51.4.3 std::vector . . . 554
51.4.4 std::map and std::set. . . 561
52 Negative array indices 571 53 Windows 16-bit 574 53.1 Example#1 . . . 574
53.2 Example #2 . . . 574
53.3 Example #3 . . . 575
53.4 Example #4 . . . 576
53.5 Example #5 . . . 578
CONTENTS CONTENTS
53.6 Example #6 . . . 582
53.6.1 Global variables . . . 583
IV Java 585
54 Java 586 54.1 Introduction . . . 58654.2 Returning a value . . . 586
54.3 Simple calculating functions . . . 590
54.4 JVM6memory model . . . 593
54.5 Simple function calling. . . 593
54.6 Calling beep() . . . 594
54.7 Linear congruential PRNG7 . . . 595
54.8 Conditional jumps . . . 596
54.9 Passing arguments. . . 598
54.10Bitfields . . . 599
54.11Loops . . . 600
54.12switch() . . . 602
54.13Arrays . . . 603
54.13.1Simple example . . . 603
54.13.2Summing elements of array . . . 604
54.13.3The only argument of the main() function is an array too . . . 604
54.13.4Pre-initialized array of strings. . . 605
54.13.5Variadic functions . . . 607
54.13.6Two-dimensional arrays. . . 609
54.13.7Three-dimensional arrays . . . 609
54.13.8Summary . . . 610
54.14Strings . . . 610
54.14.1First example . . . 610
54.14.2Second example . . . 611
54.15Exceptions . . . 612
54.16Classes . . . 615
54.17Simple patching . . . 617
54.17.1 First example . . . 617
54.17.2 Second example . . . 619
54.18Summary. . . 622
V Finding important/interesting stuff in the code 623
55 Identification of executable files 625 55.1 Microsoft Visual C++. . . 62555.1.1 Name mangling. . . 625
55.2 GCC . . . 625
55.2.1 Name mangling. . . 625
55.2.2 Cygwin . . . 625
55.2.3 MinGW . . . 625
55.3 Intel FORTRAN . . . 626
55.4 Watcom, OpenWatcom . . . 626
55.4.1 Name mangling. . . 626
55.5 Borland . . . 626
55.5.1 Delphi. . . 626
55.6 Other known DLLs . . . 627
56 Communication with the outer world (win32) 628 56.1 Often used functions in the Windows API . . . 628
56.2 tracer: Intercepting all functions in specific module. . . 629
57 Strings 630 57.1 Text strings . . . 630
57.1.1 C/C++ . . . 630
57.1.2 Borland Delphi . . . 630
6Java virtual machine
7Pseudorandom number generator
CONTENTS CONTENTS
57.1.3 Unicode. . . 631
57.1.4 Base64 . . . 633
57.2 Error/debug messages . . . 634
57.3 Suspicious magic strings . . . 634
58 Calls to assert() 635 59 Constants 636 59.1 Magic numbers . . . 636
59.1.1 Dates . . . 637
59.1.2 DHCP . . . 637
59.2 Searching for constants. . . 638
60 Finding the right instructions 639 61 Suspicious code patterns 641 61.1 XOR instructions . . . 641
61.2 Hand-written assembly code . . . 641
62 Using magic numbers while tracing 643 63 Other things 644 63.1 General idea. . . 644
63.2 C++ . . . 644
63.3 Some binary file patterns . . . 644
63.4 Memory “snapshots” comparing . . . 645
63.4.1 Windows registry . . . 646
63.4.2 Blink-comparator . . . 646
VI OS-specific 647
64 Arguments passing methods (calling conventions) 648 64.1 cdecl . . . 64864.2 stdcall . . . 648
64.2.1 Functions with variable number of arguments . . . 649
64.3 fastcall . . . 649
64.3.1 GCC regparm . . . 650
64.3.2 Watcom/OpenWatcom. . . 650
64.4 thiscall . . . 650
64.5 x86-64 . . . 650
64.5.1 Windows x64 . . . 650
64.5.2 Linux x64 . . . 653
64.6 Return values of float and double type . . . 653
64.7 Modifying arguments . . . 653
64.8 Taking a pointer to function argument . . . 654
65 Thread Local Storage 656 65.1 Linear congruential generator revisited . . . 656
65.1.1 Win32. . . 656
65.1.2 Linux . . . 660
66 System calls (syscall-s) 661 66.1 Linux . . . 661
66.2 Windows . . . 662
67 Linux 663 67.1 Position-independent code . . . 663
67.1.1 Windows . . . 665
67.2 LD_PRELOAD hack in Linux . . . 665
68 Windows NT 668 68.1 CRT (win32) . . . 668
68.2 Win32 PE. . . 671
68.2.1 Terminology . . . 671
68.2.2 Base address . . . 672
CONTENTS CONTENTS
68.2.3 Subsystem . . . 672
68.2.4 OS version . . . 672
68.2.5 Sections . . . 673
68.2.6 Relocations (relocs) . . . 673
68.2.7 Exports and imports. . . 674
68.2.8 Resources . . . 676
68.2.9 .NET . . . 676
68.2.10TLS . . . 677
68.2.11Tools. . . 677
68.2.12Further reading . . . 677
68.3 Windows SEH . . . 677
68.3.1 Let’s forget about MSVC. . . 677
68.3.2 Now let’s get back to MSVC . . . 682
68.3.3 Windows x64 . . . 695
68.3.4 Read more about SEH . . . 699
68.4 Windows NT: Critical section . . . 699
VII Tools 701
69 Disassembler 702 69.1 IDA . . . 70270 Debugger 703 70.1 OllyDbg . . . 703
70.2 GDB . . . 703
70.3 tracer . . . 703
71 System calls tracing 704 71.0.1 strace / dtruss. . . 704
72 Decompilers 705 73 Other tools 706
VIII Examples of real-world RE tasks 707
74 Task manager practical joke (Windows Vista) 709 74.1 Using LEA to load values. . . 71175 Color Lines game practical joke 713 76 Minesweeper (Windows XP) 717 76.1 Exercises . . . 721
77 Hand decompiling + Z3 SMT solver 722 77.1 Hand decompiling . . . 722
77.2 Now let’s use the Z3 SMT solver . . . 725
78 Dongles 730 78.1 Example #1: MacOS Classic and PowerPC . . . 730
78.2 Example #2: SCO OpenServer. . . 737
78.2.1 Decrypting error messages. . . 744
78.3 Example #3: MS-DOS . . . 746
79 “QR9”: Rubik’s cube inspired amateur crypto-algorithm 752 80 SAP 779 80.1 About SAP client network traffic compression . . . 779
80.2 SAP 6.0 password checking functions. . . 789
81 Oracle RDBMS 794 81.1 V$VERSION table in the Oracle RDBMS . . . 794
81.2 X$KSMLRU table in Oracle RDBMS . . . 801
81.3 V$TIMER table in Oracle RDBMS . . . 803
CONTENTS CONTENTS
82 Handwritten assembly code 807
82.1 EICAR test file . . . 807
83 Demos 809 83.1 10 PRINT CHR$(205.5+RND(1)); : GOTO 10 . . . 809
83.1.1 Trixter’s 42 byte version. . . 809
83.1.2 My attempt to reduce Trixter’s version: 27 bytes. . . 810
83.1.3 Taking random memory garbage as a source of randomness . . . 810
83.1.4 Conclusion. . . 811
83.2 Mandelbrot set . . . 812
83.2.1 Theory . . . 813
83.2.2 Let’s get back to the demo . . . 818
83.2.3 My “fixed” version . . . 820
IX Examples of reversing proprietary file formats 822
84 Primitive XOR-encryption 823 84.1 Norton Guide: simplest possible 1-byte XOR encryption . . . 82484.1.1 Entropy . . . 825
84.2 Simplest possible 4-byte XOR encryption . . . 827
84.2.1 Exercise. . . 830
85 Millenium game save file 831 86 Oracle RDBMS: .SYM-files 838 87 Oracle RDBMS: .MSB-files 847 87.1 Summary. . . 852
X Other things 853
88 npad 854 89 Executable files patching 856 89.1 Text strings . . . 85689.2 x86 code . . . 856
90 Compiler intrinsic 857 91 Compiler’s anomalies 858 92 OpenMP 859 92.1 MSVC . . . 861
92.2 GCC . . . 862
93 Itanium 865 94 8086 memory model 868 95 Basic blocks reordering 869 95.1 Profile-guided optimization . . . 869
XI Books/blogs worth reading 871
96 Books 872 96.1 Windows . . . 87296.2 C/C++ . . . 872
96.3 x86 / x86-64 . . . 872
96.4 ARM . . . 872
96.5 Cryptography . . . 872
97 Blogs 873 97.1 Windows . . . 873
CONTENTS CONTENTS
98 Other 874
Afterword 876
99 Questions? 876
Appendix 878
A x86 878
A.1 Terminology . . . 878
A.2 General purpose registers . . . 878
A.2.1 RAX/EAX/AX/AL. . . 878
A.2.2 RBX/EBX/BX/BL . . . 879
A.2.3 RCX/ECX/CX/CL. . . 879
A.2.4 RDX/EDX/DX/DL . . . 879
A.2.5 RSI/ESI/SI/SIL. . . 879
A.2.6 RDI/EDI/DI/DIL . . . 879
A.2.7 R8/R8D/R8W/R8L . . . 879
A.2.8 R9/R9D/R9W/R9L . . . 880
A.2.9 R10/R10D/R10W/R10L . . . 880
A.2.10 R11/R11D/R11W/R11L . . . 880
A.2.11 R12/R12D/R12W/R12L . . . 880
A.2.12 R13/R13D/R13W/R13L . . . 880
A.2.13 R14/R14D/R14W/R14L . . . 880
A.2.14 R15/R15D/R15W/R15L . . . 880
A.2.15 RSP/ESP/SP/SPL . . . 881
A.2.16 RBP/EBP/BP/BPL . . . 881
A.2.17 RIP/EIP/IP . . . 881
A.2.18 CS/DS/ES/SS/FS/GS . . . 881
A.2.19 Flags register . . . 881
A.3 FPU registers . . . 882
A.3.1 Control Word . . . 882
A.3.2 Status Word . . . 883
A.3.3 Tag Word . . . 883
A.4 SIMD registers . . . 884
A.4.1 MMX registers. . . 884
A.4.2 SSE and AVX registers. . . 884
A.5 Debugging registers. . . 884
A.5.1 DR6 . . . 884
A.5.2 DR7 . . . 884
A.6 Instructions . . . 885
A.6.1 Prefixes . . . 885
A.6.2 Most frequently used instructions . . . 886
A.6.3 Less frequently used instructions. . . 890
A.6.4 FPU instructions . . . 894
A.6.5 Instructions having printable ASCII opcode . . . 895
B ARM 897 B.1 Terminology . . . 897
B.2 Versions . . . 897
B.3 32-bit ARM (AArch32). . . 897
B.3.1 General purpose registers . . . 897
B.3.2 Current Program Status Register (CPSR) . . . 898
B.3.3 VFP (floating point) and NEON registers . . . 898
B.4 64-bit ARM (AArch64). . . 898
B.4.1 General purpose registers . . . 898
B.5 Instructions . . . 899
B.5.1 Conditional codes table. . . 899
C MIPS 900 C.1 Registers . . . 900
C.1.1 General purpose registers GPR8. . . 900
8General Purpose Registers
CONTENTS CONTENTS
C.1.2 Floating-point registers. . . 900
C.2 Instructions . . . 900
C.2.1 Jump instructions. . . 901
D Some GCC library functions 902 E Some MSVC library functions 903 F Cheatsheets 904 F.1 IDA . . . 904
F.2 OllyDbg . . . 904
F.3 MSVC . . . 905
F.4 GCC . . . 905
F.5 GDB . . . 905
Acronyms used 908
Glossary 912
Index 914
Bibliography 920
CONTENTS CONTENTS
Preface
There are several popular meanings of the term “reverse engineering”: 1) The reverse engineering of software: researching compiled programs; 2) The scanning of 3D structures and the subsequent digital manipulation required in order to duplicate them; 3) RecreatingDBMS9structure. This book is about the first meaning.
Topics discussed in-depth
x86/x64, ARM/ARM64, MIPS, Java/JVM.
Topics touched upon
Oracle RDBMS (81 on page 794), Itanium (93 on page 865), copy-protection dongles (78 on page 730), LD_PRELOAD (67.2 on page 665), stack overflow,ELF10, win32 PE file format (68.2 on page 671), x86-64 (26.1 on page 403), critical sections (68.4 on page 699), syscalls (66 on page 661),TLS11, position-independent code (PIC12) (67.1 on page 663), profile-guided optimization (95.1 on page 869), C++ STL (51.4 on page 538), OpenMP (92 on page 859), SEH (68.3 on page 677).
Exercises and tasks
…are all moved to the separate website:http://challenges.re.
About the author
Dennis Yurichev is an experienced reverse engineer and programmer. He can be contacted by email: dennis(a)yurichev.com, or on Skype: dennis.yurichev.
Praise for Reverse Engineering for Beginners
• “It’s very well done .. and for free .. amazing.”13 Daniel Bilar, Siege Technologies, LLC.
• “... excellent and free”14Pete Finnigan, Oracle RDBMS security guru.
• “... book is interesting, great job!” Michael Sikorski, author of Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software.
• “... my compliments for the very nice tutorial!” Herbert Bos, full professor at the Vrije Universiteit Amsterdam, co-author of Modern Operating Systems (4th Edition).
• “... It is amazing and unbelievable.” Luis Rocha, CISSP / ISSAP, Technical Manager, Network & Information Security at Verizon Business.
• “Thanks for the great work and your book.” Joris van de Vis, SAP Netweaver & Security specialist.
• “... reasonable intro to some of the techniques.”15 Mike Stay, teacher at the Federal Law Enforcement Training Center, Georgia, US.
9Database management systems
10Executable file format widely used in *NIX systems including Linux
11Thread Local Storage
12Position Independent Code:67.1 on page 663
13twitter.com/daniel_bilar/status/436578617221742593
14twitter.com/petefinnigan/status/400551705797869568
15reddit
CONTENTS CONTENTS
• “I love this book! I have several students reading it at the moment, plan to use it in graduate course.”16 Sergey Bratus, Research Assistant Professor at the Computer Science Department at Dartmouth College
• “Dennis @Yurichev has published an impressive (and free!) book on reverse engineering”17Tanel Poder, Oracle RDBMS performance tuning expert .
• “This book is some kind of Wikipedia to beginners...” Archer, Chinese Translator, IT Security Researcher.
Thanks
For patiently answering all my questions: Andrey “herm1t” Baranovich, Slava “Avid” Kazakov.
For sending me notes about mistakes and inaccuracies: Stanislav “Beaver” Bobrytskyy, Alexander Lysenko, Shell Rocket, Zhu Ruijin, Changmin Heo.
For helping me in other ways: Andrew Zubinski, Arnaud Patard (rtp on #debian-arm IRC), Aliaksandr Autayeu.
For translating the book into Simplified Chinese: Antiy Labs (antiy.cn), Archer.
For translating the book into Korean : Byungho Min.
For translating the book into Dutch: Cedric Sambre (AKA Midas).
For translating the book into Spanish: Diego Boy, Luis Alberto Espinosa Calvo.
For translating the book into Portuguese: Thales Stevan de A. Gois.
For proofreading: Alexander “Lstar” Chernenkiy, Vladimir Botov, Andrei Brazhuk, Mark “Logxen” Cooper, Yuan Jochen Kang, Mal Malakov, Lewis Porter, Jarle Thorsen.
Vasil Kolev did a great amount of work in proofreading and correcting many mistakes.
For illustrations and cover art: Andy Nechaevsky.
Thanks also to all the folks on github.com who have contributed notes and corrections.
Many LATEX packages were used: I would like to thank the authors as well.
Donors
Those who supported me during the time when I wrote significant part of the book:
2 * Oleg Vygovsky (50+100 UAH), Daniel Bilar ($50), James Truscott ($4.5), Luis Rocha ($63), Joris van de Vis ($127), Richard S Shultz ($20), Jang Minchang ($20), Shade Atlas (5 AUD), Yao Xiao ($10), Pawel Szczur (40 CHF), Justin Simms ($20), Shawn the R0ck ($27), Ki Chan Ahn ($50), Triop AB (100 SEK), Ange Albertini (e10+50), Sergey Lukianov (300 RUR), Ludvig Gislason (200 SEK), Gérard Labadie (e40), Sergey Volchkov (10 AUD), Vankayala Vigneswararao ($50), Philippe Teuwen ($4), Martin Haeberli ($10), Victor Cazacov (e5), Tobias Sturzenegger (10 CHF), Sonny Thai ($15), Bayna AlZaabi ($75), Redfive B.V. (e25), Joona Oskari Heikkilä (e5), Marshall Bishop ($50), Nicolas Werner (e12), Jeremy Brown ($100), Alexandre Borges ($25), Vladimir Dikovski (e50), Jiarui Hong (100.00 SEK), Jim Di (500 RUR), Tan Vincent ($30), Sri Harsha Kandrakota (10 AUD), Pillay Harish (10 SGD), Timur Valiev (230 RUR), Carlos Garcia Prado (e10), Salikov Alexander (500 RUR), Oliver Whitehouse (30 GBP), Katy Moe ($14), Maxim Dyakonov ($3), Sebastian Aguilera (e20), Hans-Martin Münch (e15), Jarle Thorsen (100 NOK), Vitaly Osipov ($100), Yuri Romanov (1000 RUR), Aliaksandr Autayeu (e10), Tudor Azoitei ($40), Z0vsky (e10), Yu Dai ($10).
Thanks a lot to every donor!
mini-FAQ
Q: Why should one learn assembly language these days?
A: Unless you are anOS18developer, you probably don’t need to code in assembly—modern compilers are much better at performing optimizations than humans19.
Also, modernCPU20s are very complex devices and assembly knowledge doesn’t really help one to understand their internals.
That being said, there are at least two areas where a good understanding of assembly can be helpful: First and foremost, security/malware research. It is also a good way to gain a better understanding of your compiled code whilst debugging.
This book is therefore intended for those who want to understand assembly language rather than to code in it, which is why there are many examples of compiler output contained within.
16twitter.com/sergeybratus/status/505590326560833536
17twitter.com/TanelPoder/status/524668104065159169
18Operating System
19A very good text about this topic: [Fog13b]
20Central processing unit
CONTENTS CONTENTS
Q: I clicked on a hyperlink inside a PDF-document, how do I go back?
A: In Adobe Acrobat Reader click Alt+LeftArrow.
Q: Your book is huge! Is there anything shorter?
A: A shortened (lite) version can be found here:http://beginners.re/#lite.
Q: I’m not sure if I should try to learn reverse engineering or not.
A: The average time to become familiar with the contents of the shortened LITE-version should be about 1-2 month(s). You may also try thereverse engineering challenges.
Q: May I print this book / use it for teaching?
A: Of course! That’s why the book is licensed under the Creative Commons license. If you also want to build your own version of the book—readhereto find out more.
Q: Why is this book free? You’ve done great job. This is suspicious, as many other free things.
A: In my own experience, authors of technical literature do this mostly for self-advertisement purposes. It’s not possible to get any decent money from such work.
Q: How does one get a job in reverse engineering?
A: There are hiring threads that appear from time to time on reddit, devoted to RE21(2013 Q3,2014). Try looking there.
A somewhat related hiring thread can be found in the “netsec” subreddit:2014 Q2.
Q: I have a question...
A: Send it to me by email (dennis(a)yurichev.com).
About the Korean translation
In January 2015, the Acorn publishing company (www.acornpub.co.kr) in South Korea did a huge amount of work in translating and publishing my book (as it was in August 2014) into Korean.
It’s now available attheir website.
The translator is Byungho Min (twitter/tais9).
The cover art was done by my artistic friend, Andy Nechaevsky:facebook/andydinka.
They also hold the copyright to the Korean translation.
So, if you want to have a real book on your shelf in Korean and want to support my work, it is now available for purchase.
21reddit.com/r/ReverseEngineering/
Part I
Code patterns
Everything is comprehended in comparison
Author unknown When the author of this book first started learning C and, later, C++, he used to write small pieces of code, compile them, and then look at the assembly language output. This made it very easy for him to understand what was going on in the code that he had written. 22. He did it so many times that the relationship between the C/C++ code and what the compiler produced was imprinted deeply in his mind. It’s easy to imagine instantly a rough outline of C code’s appearance and function. Perhaps this technique could be helpful for others.
Sometimes ancient compilers are used here, in order to get the shortest (or simplest) possible code snippet.
Exercises
When the author of this book studied assembly language, he also often compiled small C-functions and then rewrote them gradually to assembly, trying to make their code as short as possible. This probably is not worth doing in real-world scenarios today, because it’s hard to compete with modern compilers in terms of efficiency. It is, however, a very good way to gain a better understanding of assembly. Feel free, therefore, to take any assembly code from this book and try to make it shorter.
However, don’t forget to test what you have written.
Optimization levels and debug information
Source code can be compiled by different compilers with various optimization levels. A typical compiler has about three such levels, where level zero means disable optimization. Optimization can also be targeted towards code size or code speed. A non-optimizing compiler is faster and produces more understandable (albeit verbose) code, whereas an optimizing compiler is slower and tries to produce code that runs faster (but is not necessarily more compact). In addition to optimization levels and direction, a compiler can include in the resulting file some debug information, thus producing code for easy debugging.
One of the important features of the ´debug’ code is that it might contain links between each line of the source code and the respective machine code addresses. Optimizing compilers, on the other hand, tend to produce output where entire lines of source code can be optimized away and thus not even be present in the resulting machine code. Reverse engineers can encounter either version, simply because some developers turn on the compiler’s optimization flags and others do not.
Because of this, we’ll try to work on examples of both debug and release versions of the code featured in this book, where possible.
22In fact, he still does it when he can’t understand what a particular bit of code does.
CHAPTER 1. A SHORT INTRODUCTION TO THE CPU CHAPTER 1. A SHORT INTRODUCTION TO THE CPU
Chapter 1
A short introduction to the CPU
TheCPUis the device that executes the machine code a program consists of.
A short glossary:
Instruction : A primitive CPUcommand. The simplest examples include: moving data between registers, working with memory, primitive arithmetic operations. As a rule, eachCPUhas its own instruction set architecture (ISA).
Machine code : Code that theCPUdirectly processes. Each instruction is usually encoded by several bytes.
Assembly language : Mnemonic code and some extensions like macros that are intended to make a programmer’s life easier.
CPU register : EachCPUhas a fixed set of general purpose registers (GPR). ≈ 8 in x86, ≈ 16 in x86-64, ≈ 16 in ARM. The easiest way to understand a register is to think of it as an untyped temporary variable. Imagine if you were working with a high-levelPL1and could only use eight 32-bit (or 64-bit) variables. Yet a lot can be done using just these!
One might wonder why there needs to be a difference between machine code and aPL. The answer lies in the fact that humans andCPUs are not alike—it is much easier for humans to use a high-levelPLlike C/C++, Java, Python, etc., but it is easier for aCPUto use a much lower level of abstraction. Perhaps it would be possible to invent aCPUthat can execute high-levelPLcode, but it would be many times more complex than theCPUs we know of today. In a similar fashion, it is very inconvenient for humans to write in assembly language, due to it being so low-level and difficult to write in without making a huge number of annoying mistakes. The program that converts the high-levelPLcode into assembly is called a compiler.
2.
1.1 A couple of words about different ISAs
The x86ISAhas always been one with variable-length opcodes, so when the 64-bit era came, the x64 extensions did not impact theISAvery significantly. In fact, the x86ISAstill contains a lot of instructions that first appeared in 16-bit 8086 CPU, yet are still found in the CPUs of today. ARM is aRISC3CPUdesigned with constant-length opcode in mind, which had some advantages in the past. In the very beginning, all ARM instructions were encoded in 4 bytes4. This is now referred to as
“ARM mode”. Then they thought it wasn’t as frugal as they first imagined. In fact, most usedCPUinstructions5in real world applications can be encoded using less information. They therefore added anotherISA, called Thumb, where each instruction was encoded in just 2 bytes. This is now referred as “Thumb mode”. However, not all ARM instructions can be encoded in just 2 bytes, so the Thumb instruction set is somewhat limited. It is worth noting that code compiled for ARM mode and Thumb mode may of course coexist within one single program. The ARM creators thought Thumb could be extended, giving rise to Thumb-2, which appeared in ARMv7. Thumb-2 still uses 2-byte instructions, but has some new instructions which have the size of 4 bytes. There is a common misconception that Thumb-2 is a mix of ARM and Thumb. This is incorrect. Rather, Thumb- 2 was extended to fully support all processor features so it could compete with ARM mode—a goal that was clearly achieved, as the majority of applications for iPod/iPhone/iPad are compiled for the Thumb-2 instruction set (admittedly, largely due to the fact that Xcode does this by default). Later the 64-bit ARM came out. ThisISAhas 4-byte opcodes, and lacked the need of any additional Thumb mode. However, the 64-bit requirements affected theISA, resulting in us now having three ARM instruction sets: ARM mode, Thumb mode (including Thumb-2) and ARM64. TheseISAs intersect partially, but it can be said that they are differentISAs, rather than variations of the same one. Therefore, we would try to add fragments of code in all three ARMISAs in this book. There are, by the way, many otherRISC ISAs with fixed length 32-bit opcodes, such as MIPS, PowerPC and Alpha AXP.
1Programming language
2Old-school Russian literature also use term “translator”.
3Reduced instruction set computing
4By the way, fixed-length instructions are handy because one can calculate the next (or previous) instruction address without effort. This feature will be discussed in the switch() operator (13.2.2 on page 162) section.
5These are MOV/PUSH/CALL/Jcc
CHAPTER 2. THE SIMPLEST FUNCTION CHAPTER 2. THE SIMPLEST FUNCTION
Chapter 2
The simplest Function
The simplest possible function is arguably one that simply returns a constant value:
Here it is:
Listing 2.1: C/C++ Code int f()
{
return 123;
};
Lets compile it!
2.1 x86
Here’s what both the optimizing GCC and MSVC compilers produce on the x86 platform:
Listing 2.2: Optimizing GCC/MSVC (assembly output) f:
mov eax, 123 ret
There are just two instructions: the first places the value 123 into the EAX register, which is used by convention for storing the return value and the second one is RET, which returns execution to thecaller.
The caller will take the result from the EAX register.
2.2 ARM
There are a few differences on the ARM platform:
Listing 2.3: Optimizing Keil 6/2013 (ARM mode) ASM Output f PROC
MOV r0,#0x7b ; 123
BX lr
ENDP
ARM uses the register R0 for returning the results of functions, so 123 is copied into R0.
The return address is not saved on the local stack in the ARMISA, but rather in the link register, so the BX LR instruction causes execution to jump to that address—effectively returning execution to thecaller.
It is worth noting that MOV is a misleading name for the instruction in both x86 and ARMISAs.
The data is not in fact moved, but copied.
CHAPTER 2. THE SIMPLEST FUNCTION CHAPTER 2. THE SIMPLEST FUNCTION
2.3 MIPS
There are two naming conventions used in the world of MIPS when naming registers: by number (from $0 to $31) or by pseudoname ($V0, $A0, etc).
The GCC assembly output below lists registers by number:
Listing 2.4: Optimizing GCC 4.4.5 (assembly output)
j $31
li $2,123 # 0x7b
…whileIDA1does it—by their pseudonames:
Listing 2.5: Optimizing GCC 4.4.5 (IDA)
jr $ra
li $v0, 0x7B
The $2 (or $V0) register is used to store the function’s return value. LI stands for “Load Immediate” and is the MIPS equivalent to MOV.
The other instruction is the jump instruction (J or JR) which returns the execution flow to thecaller, jumping to the address in the $31 (or $RA) register.
This is the register analogous toLR2in ARM.
You might be wondering why positions of the the load instruction (LI) and the jump instruction (J or JR) are swapped. This is due to aRISCfeature called “branch delay slot”.
The reason this happens is a quirk in the architecture of some RISCISAs and isn’t important for our purposes—we just need to remember that in MIPS, the instruction following a jump or branch instruction is executed before the jump/branch instruction itself.
As a consequence, branch instructions always swap places with the instruction which must be executed beforehand.
2.3.1 A note about MIPS instruction/register names
Register and instruction names in the world of MIPS are traditionally written in lowercase. However, for the sake of consis- tency, we’ll stick to using uppercase letters, as it is the convention followed by all otherISAs featured this book.
1Interactive Disassembler and debugger developed byHex-Rays
2Link Register
CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!
Chapter 3
Hello, world!
Let’s use the famous example from the book “The C programming Language”[Ker88]:
#include <stdio.h>
int main() {
printf("hello, world\n");
return 0;
}
3.1 x86
3.1.1 MSVC
Let’s compile it in MSVC 2010:
cl 1.cpp /Fa1.asm
(/Fa option instructs the compiler to generate assembly listing file) Listing 3.1: MSVC 2010 CONST SEGMENT
$SG3830 DB 'hello, world', 0AH, 00H CONST ENDS
PUBLIC _main
EXTRN _printf:PROC
; Function compile flags: /Odtp _TEXT SEGMENT
_main PROC
push ebp mov ebp, esp push OFFSET $SG3830 call _printf
add esp, 4 xor eax, eax
pop ebp
ret 0
_main ENDP _TEXT ENDS
MSVC produces assembly listings in Intel-syntax. The difference between Intel-syntax and AT&T-syntax will be discussed in3.1.3 on page 8.
The compiler generated the file, 1.obj, which is to be linked into 1.exe. In our case, the file contains two segments:
CONST (for data constants) and _TEXT (for code).
The string hello, world in C/C++ has type const char[][Str13, p176, 7.3.2], but it does not have its own name. The compiler needs to deal with the string somehow so it defines the internal name $SG3830 for it.
That is why the example may be rewritten as follows:
CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!
#include <stdio.h>
const char $SG3830[]="hello, world\n";
int main() {
printf($SG3830);
return 0;
}
Let’s go back to the assembly listing. As we can see, the string is terminated by a zero byte, which is standard for C/C++
strings.More about C/C++ strings:57.1.1 on page 630.
In the code segment, _TEXT, there is only one function so far: main().The function main() starts with prologue code and ends with epilogue code (like almost any function)1.
After the function prologue we see the call to the printf() function: CALL _printf. Before the call the string address (or a pointer to it) containing our greeting is placed on the stack with the help of the PUSH instruction.
When the printf() function returns the control to the main() function, the string address (or a pointer to it) is still on the stack. Since we do not need it anymore, thestack pointer(the ESP register) needs to be corrected.
ADD ESP, 4 means add 4 to the ESP register value.
Why 4? Since this is a 32-bit program, we need exactly 4 bytes for address passing through the stack. If it was x64 code we would need 8 bytes. ADD ESP, 4 is effectively equivalent to POP register but without using any register2.
For the same purpose, some compilers (like the Intel C++ Compiler) may emit POP ECX instead of ADD (e.g., such a pattern can be observed in the Oracle RDBMS code as it is compiled with the Intel C++ compiler). This instruction has almost the same effect but the ECX register contents will be overwritten. The Intel C++ compiler probably uses POP ECX since this instruction’s opcode is shorter than ADD ESP, x (1 byte for POP against 3 for ADD).
Here is an example of using POP instead of ADD from Oracle RDBMS:
Listing 3.2: Oracle RDBMS 10.2 Linux (app.o file)
.text:0800029A push ebx
.text:0800029B call qksfroChild
.text:080002A0 pop ecx
After calling printf(), the original C/C++ code contains the statement return 0 —return 0 as the result of the main() function.
In the generated code this is implemented by the instruction XOR EAX, EAX.
XOR is in fact just “eXclusive OR”3but the compilers often use it instead of MOV EAX, 0—again because it is a slightly shorter opcode (2 bytes for XOR against 5 for MOV).
Some compilers emit SUB EAX, EAX, which means SUBtract the value in the EAX from the value in EAX, which, in any case, results in zero.
The last instruction RET returns the control to thecaller. Usually, this is C/C++CRT4code, which, in turn, returns control to theOS.
3.1.2 GCC
Now let’s try to compile the same C/C++ code in the GCC 4.4.1 compiler in Linux: gcc 1.c -o 1.Next, with the assistance of theIDAdisassembler, let’s see how the main() function was created.IDA, like MSVC, uses Intel-syntax5.
Listing 3.3: code inIDA
main proc near
var_10 = dword ptr -10h push ebp mov ebp, esp
1You can read more about it in the section about function prologues and epilogues (4 on page 22).
2CPU flags, however, are modified
3wikipedia
4C runtime library :68.1 on page 668
5We could also have GCC produce assembly listings in Intel-syntax by applying the options -S -masm=intel.
CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!
and esp, 0FFFFFFF0h sub esp, 10h
mov eax, offset aHelloWorld ; "hello, world\n"
mov [esp+10h+var_10], eax call _printf
mov eax, 0 leave
retn
main endp
The result is almost the same. The address of the hello, world string (stored in the data segment) is loaded in the EAX register first and then it is saved onto the stack. In addition, the function prologue contains AND ESP, 0FFFFFFF0h —this instruction aligns the ESP register value on a 16-byte boundary. This results in all values in the stack being aligned the same way (The CPU performs better if the values it is dealing with are located in memory at addresses aligned on a 4-byte or 16-byte boundary)6.
SUB ESP, 10h allocates 16 bytes on the stack. Although, as we can see hereafter, only 4 are necessary here.
This is because the size of the allocated stack is also aligned on a 16-byte boundary.
The string address (or a pointer to the string) is then stored directly onto the stack without using the PUSH instruction.
var_10 —is a local variable and is also an argument for printf(). Read about it below.
Then the printf() function is called.
Unlike MSVC, when GCC is compiling without optimization turned on, it emits MOV EAX, 0 instead of a shorter opcode.
The last instruction, LEAVE —is the equivalent of the MOV ESP, EBP and POP EBP instruction pair —in other words, this instruction sets thestack pointer(ESP) back and restores the EBP register to its initial state. This is necessary since we modified these register values (ESP and EBP) at the beginning of the function (by executing MOV EBP, ESP / AND ESP,
…).
3.1.3 GCC: AT&T syntax
Let’s see how this can be represented in assembly language AT&T syntax.This syntax is much more popular in the UNIX-world.
Listing 3.4: let’s compile in GCC 4.7.3 gcc -S 1_1.c
We get this:
Listing 3.5: GCC 4.7.3 .file "1_1.c"
.section .rodata .LC0:
.string "hello, world\n"
.text
.globl main
.type main, @function main:
.LFB0:
.cfi_startproc pushl %ebp
.cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call printf
movl $0, %eax leave
.cfi_restore 5 .cfi_def_cfa 4, 4 ret
.cfi_endproc .LFE0:
6Wikipedia: Data structure alignment
CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",@progbits
The listing contains many macros (beginning with dot). These are not interesting for us at the moment.
For now, for the sake of simplification, we can ignore them (except the .string macro which encodes a null-terminated char- acter sequence just like a C-string). Then we’ll see this7:
Listing 3.6: GCC 4.7.3 .LC0:
.string "hello, world\n"
main:
pushl %ebp movl %esp, %ebp andl $-16, %esp subl $16, %esp movl $.LC0, (%esp) call printf
movl $0, %eax leave
ret
Some of the major differences between Intel and AT&T syntax are:
• Source and destination operands are written in opposite order.
In Intel-syntax: <instruction> <destination operand> <source operand>.
In AT&T syntax: <instruction> <source operand> <destination operand>.
Here is an easy way to memorise the difference: when you deal with Intel-syntax, you can imagine that there is an equality sign (=) between operands and when you deal with AT&T-syntax imagine there is a right arrow (→)8.
• AT&T: Before register names, a percent sign must be written (%) and before numbers a dollar sign ($). Parentheses are used instead of brackets.
• AT&T: A suffix is added to instructions to define the operand size:
– q — quad (64 bits) – l — long (32 bits) – w — word (16 bits) – b — byte (8 bits)
Let’s go back to the compiled result: it is identical to what we saw inIDA. With one subtle difference: 0FFFFFFF0h is presented as $-16. It is the same thing: 16 in the decimal system is 0x10 in hexadecimaal. -0x10 is equal to 0xFFFFFFF0 (for a 32-bit data type).
One more thing: the return value is to be set to 0 by using the usual MOV, not XOR. MOV just loads a value to a register. Its name is a misnomer (data is not moved but rather copied). In other architectures, this instruction is named “LOAD” or “STORE”
or something similar.
3.2 x86-64
3.2.1 MSVC—x86-64
Let’s also try 64-bit MSVC:
Listing 3.7: MSVC 2012 x64
$SG2989 DB 'hello, world', 0AH, 00H main PROC
sub rsp, 40
lea rcx, OFFSET FLAT:$SG2989 call printf
7This GCC option can be used to eliminate “unnecessary” macros: -fno-asynchronous-unwind-tables
8By the way, in some C standard functions (e.g., memcpy(), strcpy()) the arguments are listed in the same way as in Intel-syntax: first the pointer to the destination memory block, and then the pointer to the source memory block.
CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!
xor eax, eax add rsp, 40
ret 0
main ENDP
In x86-64, all registers were extended to 64-bit and now their names have an R- prefix. In order to use the stack less often (in other words, to access external memory/cache less often), there exists a popular way to pass function arguments via registers (fastcall)64.3 on page 649. I.e., a part of the function arguments is passed in registers, the rest—via the stack. In Win64, 4 function arguments are passed in the RCX, RDX, R8, R9 registers. That is what we see here: a pointer to the string for printf() is now passed not in the stack, but in the RCX register. The pointers are 64-bit now, so they are passed in the 64-bit registers (which have the R- prefix). However, for backward compatibility, it is still possible to access the 32-bit parts, using the E- prefix. This is how the RAX/EAX/AX/AL register looks like in x86-64:
7th(byte number) 6th 5th 4th 3rd 2nd 1st 0th
RAXx64
EAX AX
AH AL
The main() function returns an int-typed value, which is, in C/C++, for better backward compatibility and portability, still 32-bit, so that is why the EAX register is cleared at the function end (i.e., the 32-bit part of the register) instead of RAX. There are also 40 bytes allocated in the local stack. This is called the “shadow space”, about which we are going to talk later:8.2.1 on page 91.
3.2.2 GCC—x86-64
Let’s also try GCC in 64-bit Linux:
Listing 3.8: GCC 4.4.6 x64 .string "hello, world\n"
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0 ; "hello, world\n"
xor eax, eax ; number of vector registers passed call printf
xor eax, eax add rsp, 8 ret
A method to pass function arguments in registers is also used in Linux, *BSD and Mac OS X is [Mit13].
The first 6 arguments are passed in the RDI, RSI, RDX, RCX, R8, R9 registers, and the rest—via the stack.
So the pointer to the string is passed in EDI (the 32-bit part of the register). But why not use the 64-bit part, RDI?
It is important to keep in mind that all MOV instructions in 64-bit mode that write something into the lower 32-bit register part also clear the higher 32-bits [Int13]. I.e., the MOV EAX, 011223344h writes a value into RAX correctly, since the higher bits will be cleared.
If we open the compiled object file (.o), we can also see all the instructions’ opcodes9: Listing 3.9: GCC 4.4.6 x64
.text:00000000004004D0 main proc near
.text:00000000004004D0 48 83 EC 08 sub rsp, 8
.text:00000000004004D4 BF E8 05 40 00 mov edi, offset format ; "hello, world\n"
.text:00000000004004D9 31 C0 xor eax, eax .text:00000000004004DB E8 D8 FE FF FF call _printf .text:00000000004004E0 31 C0 xor eax, eax .text:00000000004004E2 48 83 C4 08 add rsp, 8 .text:00000000004004E6 C3 retn
.text:00000000004004E6 main endp
As we can see, the instruction that writes into EDI at 0x4004D4 occupies 5 bytes. The same instruction writing a 64-bit value into RDI occupies 7 bytes. Apparently, GCC is trying to save some space. Besides, it can be sure that the data segment containing the string will not be allocated at the addresses higher than 4GiB.
We also see that the EAX register was cleared before the printf() function call. This is done because the number of used vector registers is passed in EAX in *NIX systems on x86-64 ([Mit13]).
9This must be enabled in Options→ Disassembly → Number of opcode bytes
CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!
3.3 GCC—one more thing
The fact that an anonymous C-string has const type (3.1.1 on page 6), and that C-strings allocated in constants segment are guaranteed to be immutable, has an interesting consequence: the compiler may use a specific part of the string.
Let’s try this example:
#include <stdio.h>
int f1() {
printf ("world\n");
}
int f2() {
printf ("hello world\n");
}
int main() {
f1();
f2();
}
Common C/C++-compilers (including MSVC) allocate two strings, but let’s see what GCC 4.8.1 does:
Listing 3.10: GCC 4.8.1 + IDA listing
f1 proc near
s = dword ptr -1Ch
sub esp, 1Ch
mov [esp+1Ch+s], offset s ; "world\n"
call _puts add esp, 1Ch retn
f1 endp
f2 proc near
s = dword ptr -1Ch
sub esp, 1Ch
mov [esp+1Ch+s], offset aHello ; "hello "
call _puts add esp, 1Ch retn
f2 endp
aHello db 'hello '
s db 'world',0xa,0
Indeed: when we print the “hello world” string these two words are positioned in memory adjacently and puts() called from f2() function is not aware that this string is divided. In fact, it’s not divided; it’s divided only “virtually”, in this listing.
When puts() is called from f1(), it uses the “world” string plus a zero byte. puts() is not aware that there is something before this string!
This clever trick is often used by at least GCC and can save some memory.
3.4 ARM
For my experiments with ARM processors, several compilers were used:
• Popular in the embedded area: Keil Release 6/2013.
• Apple Xcode 4.6.3 IDE (with the LLVM-GCC 4.2 compiler10).
10It is indeed so: Apple Xcode 4.6.3 uses open-source GCC as front-end compiler and LLVM code generator