Static and Dynamic Binary Analysis - Mobile Application Security Testing Guide (MASTG)

Reverse engineering is the process of reconstructing the semantics of a compiled program’s source code. In other words, you take the program apart, run it, simulate parts of it, and do other unspeakable things to it to understand what it does and how.

Using Disassemblers and Decompilers

Disassemblers and decompilers allow you to translate an app’s binary code or bytecode back into a more or less understandable format. By using these tools on native binaries, you can obtain assembler code that matches the architecture the app was compiled for. Disassemblers convert machine code to assembly code which in turn is used by decompilers to generate equivalent high-level language code. Android Java apps can be disassembled to smali, which is an assembly language for the DEX format used by Dalvik, Android’s Java VM. Smali assembly can also be quite easily decompiled back to equivalent Java code.

In theory, the mapping between assembly and machine code should be one-to-one, and therefore it may give the impression that disassembling is a simple task. But in practice, there are multiple pitfalls such as:

• Reliable distinction between code and data.

• Variable instruction size.

• Indirect branch instructions.

• Functions without explicit CALL instructions within the executable’s code segment.

• Position independent code (PIC) sequences.

• Hand crafted assembly code.

Similarly, decompilation is a very complicated process, involving many deterministic and heuristic based approaches. As a consequence, decompilation is usually not really accurate, but nevertheless very helpful in getting a quick understanding of the function being analyzed. The accuracy of decompilation depends on the amount of information available in the code being decompiled and the sophistication of the decompiler. In addition, many compilation and post-compilation tools introduce additional complexity to the compiled code in order to increase the diﬃculty of comprehension and/or even decompilation itself. Such code referred to asobfuscated code.

Over the past decades many tools have perfected the process of disassembly and decompilation, producing output with high fidelity. Advanced usage instructions for any of the available tools can often easily fill a book of their own. The best way to get started is to simply pick up a tool that fits your needs and budget and get a well-reviewed user guide. In this section, we will provide an introduction to some of those tools and in the subsequent “Reverse Engineering and Tampering”

Android and iOS chapters we’ll focus on the techniques themselves, especially those that are speciﬁc to the platform at hand.

Obfuscation

Obfuscation is the process of transforming code and data to make it more difficult to comprehend (and sometimes even difficult to disassemble). It is usually an integral part of the software pro- tection scheme. Obfuscation isn’t something that can be simply turned on or off, programs can be made incomprehensible, in whole or in part, in many ways and to different degrees.

Note: All presented techniques below will not stop someone with enough time and budget from reverse engineering your app. However, combining these techniques will make their job signiﬁcantly harder. The aim is thus to discourage reverse engineers from performing further analysis and not making it worth the eﬀort.

The following techniques can be used to obfuscate an application:

• Name obfuscation

• Instruction substitution

• Control ﬂow ﬂattening

• Dead code injection

• String encryption

• Packing

Name Obfuscation

The standard compiler generates binary symbols based on class and function names from the source code. Therefore, if no obfuscation is applied, symbol names remain meaningful and can easily be extracted from the app binary. For instance, a function which detects a jailbreak can be located by searching for relevant keywords (e.g. “jailbreak”). The listing below shows the

disassembled function JailbreakDetectionViewController.jailbreakTest4Tapped from the Damn Vulnerable iOS App (DVIA-v2).

__T07DVIA_v232JailbreakDetectionViewControllerC20jailbreakTest4TappedyypF:

stp x22, x21, [sp, #-0x30]!

mov rbp, rsp

After the obfuscation we can observe that the symbol’s name is no longer meaningful as shown on the listing below.

__T07DVIA_v232zNNtWKQptikYUBNBgfFVMjSkvRdhhnbyyFySbyypF:

stp x22, x21, [sp, #-0x30]!

mov rbp, rsp

Nevertheless, this only applies to the names of functions, classes and ﬁelds. The actual code remains unmodiﬁed, so an attacker can still read the disassembled version of the function and try to understand its purpose (e.g. to retrieve the logic of a security algorithm).

Instruction Substitution

This technique replaces standard binary operators like addition or subtraction with more complex representations. For example, an additionx = a + b can be represented asx = -(-a) - (-b).

However, using the same replacement representation could be easily reversed, so it is recom- mended to add multiple substitution techniques for a single case and introduce a random factor.

This technique can be reversed during decompilation, but depending on the complexity and depth of the substitutions, reversing it can still be time consuming.

Control Flow Flattening

Control flow flattening replaces original code with a more complex representation. The transfor- mation breaks the body of a function into basic blocks and puts them all inside a single infinite loop with a switch statement that controls the program flow. This makes the program flow signif- icantly harder to follow because it removes the natural conditional constructs that usually make the code easier to read.

The image shows how control flow flattening alters code (see “Obfuscating C++ programs via control flow flattening”)

Dead Code Injection

This technique makes the program’s control ﬂow more complex by injecting dead code into the program. Dead code is a stub of code that doesn’t aﬀect the original program’s behavior but increases the overhead of the reverse engineering process.

String Encryption

Applications are often compiled with hardcoded keys, licences, tokens and endpoint URLs. By default, all of them are stored in plaintext in the data section of an application’s binary. This technique encrypts these values and injects stubs of code into the program that will decrypt that data before it is used by the program.

Packing

Packing is a dynamic rewriting obfuscation technique which compresses or encrypts the original executable into data and dynamically recovers it during execution. Packing an executable changes the ﬁle signature in an attempt to avoid signature-based detection.

Debugging and Tracing

In the traditional sense, debugging is the process of identifying and isolating problems in a program as part of the software development life cycle. The same tools used for debugging are valuable to reverse engineers even when identifying bugs is not the primary goal. Debuggers enable program suspension at any point during runtime, inspection of the process’ internal state, and even register and memory modiﬁcation. These abilities simplify program inspection.

Debuggingusually means interactive debugging sessions in which a debugger is attached to the running process. In contrast, tracing refers to passive logging of information about the app’s execution (such as API calls). Tracing can be done in several ways, including debugging APIs, function hooks, and Kernel tracing facilities. Again, we’ll cover many of these techniques in the OS-speciﬁc “Reverse Engineering and Tampering” chapters.

Dalam dokumen Mobile Application Security Testing Guide (MASTG) (Halaman 49-53)