However, there are also a few Android-specific challenges. For example, you’ll need to deal with both Java bytecode and native code. Java Native Interface (JNI) is sometimes deliberately used to confuse reverse engineers (to be fair, there are legitimate reasons for using JNI, such as improving performance or supporting legacy code). Developers sometimes use the native layer to “hide”
data and functionality, and they may structure their apps such that execution frequently jumps between the two layers.
You’ll need at least a working knowledge of both the Java-based Android environment and the Linux OS and Kernel, on which Android is based. You’ll also need the right toolset to deal with both the bytecode running on the Java virtual machine and the native code.
Note that we’ll use the OWASP UnCrackable Apps for Android as examples for demonstrating various reverse engineering techniques in the following sections, so expect partial and full spoilers.
We encourage you to have a crack at the challenges yourself before reading on!
Reverse Engineering
Reverse engineering is the process of taking an app apart to find out how it works. You can do this by examining the compiled app (static analysis), observing the app during runtime (dynamic analysis), or a combination of both.
Disassembling and Decompiling
In Android app security testing, if the application is based solely on Java and doesn’t have any na- tive code (C/C++ code), the reverse engineering process is relatively easy and recovers (decom- piles) almost all the source code. In those cases, black-box testing (with access to the compiled binary, but not the original source code) can get pretty close to white-box testing.
Nevertheless, if the code has been purposefully obfuscated (or some tool-breaking anti- decompilation tricks have been applied), the reverse engineering process may be very time- consuming and unproductive. This also applies to applications that contain native code. They can still be reverse engineered, but the process is not automated and requires knowledge of low-level details.
Decompiling Java Code
Java Disassembled Code (smali):
If you want to inspect the app’s smali code (instead of Java), you canopen your APK in Android Studioby clicking Profile or debug APK from the “Welcome screen” (even if you don’t intend to debug it you can take a look at the smali code).
Alternatively you can use apktool to extract and disassemble resources directly from the APK archive and disassemble Java bytecode to smali. apktool allows you to reassemble the package, which is useful forpatchingthe app or applying changes to e.g. the Android Manifest.
Java Decompiled Code:
If you want to look directly into Java source code on a GUI, simply open your APK using jadxor Bytecode Viewer.
Android decompilers go one step further and attempt to convert Android bytecode back into Java source code, making it more human-readable. Fortunately, Java decompilers generally handle Android bytecode well. The above mentioned tools embed, and sometimes even combine, popular free decompilers such as:
• JD
• JAD
• jadx
• Procyon
• CFR
Alternatively you can use theAPKLabextension for Visual Studio Code or runapkxon your APK or use the exported files from the previous tools to open the reversed source code on your preferred IDE.
In the following example we’ll be using UnCrackable App for Android Level 1. First, let’s install the app on a device or emulator and run it to see what the crackme is about.
Seems like we’re expected to find some kind of secret code!
We’re looking for a secret string stored somewhere inside the app, so the next step is to look inside.
First, unzip the APK file (unzip UnCrackable-Level1.apk -d UnCrackable-Level1) and look at the content. In the standard setup, all the Java bytecode and app data is in the fileclasses.dex in the app root directory (UnCrackable-Level1/). This file conforms to the Dalvik Executable Format (DEX), an Android-specific way of packaging Java programs. Most Java decompilers take plain class files or JARs as input, so you need to convert the classes.dex file into a JAR first. You
Once you have a JAR file, you can use any free decompiler to produce Java code. In this example, we’ll use the CFR decompiler. CFR is under active development, and brand-new releases are available on the author’s website. CFR was released under an MIT license, so you can use it freely even though its source code is not available.
The easiest way to run CFR is throughapkx, which also packagesdex2jarand automates extrac- tion, conversion, and decompilation. Run it on the APK and you should find the decompiled sources in the directoryUncrackable-Level1/src. To view the sources, a simple text editor (preferably with syntax highlighting) is fine, but loading the code into a Java IDE makes navigation easier.
Let’s import the code into IntelliJ, which also provides on-device debugging functionality.
Open IntelliJ and select “Android” as the project type in the left tab of the “New Project” dialog.
Enter “Uncrackable1” as the application name and “vantagepoint.sg” as the company name. This results in the package name “sg.vantagepoint.uncrackable1”, which matches the original pack- age name. Using a matching package name is important if you want to attach the debugger to the running app later on because IntelliJ uses the package name to identify the correct process.
In the next dialog, pick any API number; you don’t actually want to compile the project, so the number doesn’t matter. Click “next” and choose “Add no Activity”, then click “finish”.
Once you have created the project, expand the “1: Project” view on the left and navigate to the folderapp/src/main/java. Right-click and delete the default package “sg.vantagepoint.uncrackable1”
created by IntelliJ.
Now, open theUncrackable-Level1/srcdirectory in a file browser and drag thesgdirectory into the now emptyJavafolder in the IntelliJ project view (hold the “alt” key to copy the folder instead of moving it).
You’ll end up with a structure that resembles the original Android Studio project from which the app was built.
See the section “Reviewing Decompiled Java Code” below to learn on how to proceed when in- specting the decompiled Java code.
Disassembling Native Code
Dalvik and ART both support the Java Native Interface (JNI), which defines a way for Java code to interact with native code written in C/C++. As on other Linux-based operating systems, native code is packaged (compiled) into ELF dynamic libraries (*.so), which the Android app loads at runtime via the System.load method. However, instead of relying on widely used C libraries (such as glibc), Android binaries are built against a custom libc namedBionic. Bionic adds support for important Android-specific services such as system properties and logging, and it is not fully POSIX-compatible.
When reversing an Android application containing native code, we need to understand a couple of data structures related to the JNI bridge between Java and native code. From the reversing
perspective, we need to be aware of two key data structures: JavaVMandJNIEnv. Both of them are pointers to pointers to function tables:
• JavaVM provides an interface to invoke functions for creating and destroying a JavaVM. An- droid allows only oneJavaVMper process and is not really relevant for our reversing purposes.
• JNIEnv provides access to most of the JNI functions which are accessible at a fixed offset through theJNIEnv pointer. ThisJNIEnvpointer is the first parameter passed to every JNI function. We will discuss this concept again with the help of an example later in this chapter.
It is worth highlighting that analyzing disassembled native code is much more challenging than disassembled Java code. When reversing the native code in an Android application we will need a disassembler.
In the next example we’ll reverse the HelloWorld-JNI.apk from the OWASP MASTG repository. In- stalling and running it in an emulator or Android device is optional.
wget https://github.com/OWASP/owasp-mastg/raw/master/Samples/Android/01_HelloWorld-JNI/HelloWord-JNI.apk
This app is not exactly spectacular, all it does is show a label with the text “Hello from C++”.
This is the app Android generates by default when you create a new project with C/C++
support, which is just enough to show the basic principles of JNI calls.
Decompile the APK withapkx.
$ apkx HelloWord-JNI.apk
Extracting HelloWord-JNI.apk to HelloWord-JNI Converting: classes.dex->classes.jar(dex2jar)
dex2jar HelloWord-JNI/classes.dex->HelloWord-JNI/classes.jar Decompiling to HelloWord-JNI/src(cfr)
This extracts the source code into theHelloWord-JNI/srcdirectory. The main activity is found in the file HelloWord-JNI/src/sg/vantagepoint/helloworldjni/MainActivity.java. The
“Hello World” text view is populated in theonCreatemethod:
public classMainActivity extendsAppCompatActivity{
static{
System.loadLibrary("native-lib");
}
@Override
protectedvoidonCreate(Bundle bundle) { super.onCreate(bundle);
this.setContentView(2130968603);
((TextView)this.findViewById(2131427422)).setText((CharSequence)this.\ stringFromJNI());
}
public nativeString stringFromJNI();
}
Note the declaration of public native String stringFromJNI at the bottom. The keyword
“native” tells the Java compiler that this method is implemented in a native language. The cor- responding function is resolved during runtime, but only if a native library that exports a global symbol with the expected signature is loaded (signatures comprise a package name, class name, and method name). In this example, this requirement is satisfied by the following C or C++ func- tion:
JNIEXPORT jstring JNICALL Java_sg_vantagepoint_helloworld_MainActivity_stringFromJNI(JNIEnv *env, jobject)
So where is the native implementation of this function? If you look into the “lib” directory of the unzipped APK archive, you’ll see several subdirectories (one per supported processor architec- ture), each of them containing a version of the native library, in this case libnative-lib.so.
WhenSystem.loadLibraryis called, the loader selects the correct version based on the device that the app is running on. Before moving ahead, pay attention to the first parameter passed to the current JNI function. It is the sameJNIEnvdata structure which was discussed earlier in this section.
Following the naming convention mentioned above, you can expect the library to export a symbol called Java_sg_vantagepoint_helloworld_MainActivity_stringFromJNI. On Linux systems, you can retrieve the list of symbols with readelf (included in GNU binutils) or nm. Do this on macOS with the greadelftool, which you can install via Macports or Homebrew. The following example usesgreadelf:
$ greadelf-W -slibnative-lib.so|grep Java
3: 00004e49 112 FUNC GLOBAL DEFAULT 11 Java_sg_vantagepoint_helloworld_MainActivity_stringFromJNI
You can also see this using radare2’s rabin2:
$ rabin2-sHelloWord-JNI/lib/armeabi-v7a/libnative-lib.so|grep-iJava
003 0x00000e78 0x00000e78 GLOBAL FUNC 16 Java_sg_vantagepoint_helloworldjni_MainActivity_stringFromJNI
This is the native function that eventually gets executed when thestringFromJNInative method is called.
To disassemble the code, you can loadlibnative-lib.sointo any disassembler that understands ELF binaries (i.e., any disassembler). If the app ships with binaries for different architectures, you can theoretically pick the architecture you’re most familiar with, as long as it is compatible with the disassembler. Each version is compiled from the same source and implements the same functionality. However, if you’re planning to debug the library on a live device later, it’s usually wise to pick an ARM build.
To support both older and newer ARM processors, Android apps ship with multiple ARM builds compiled for different Application Binary Interface (ABI) versions. The ABI defines how the appli- cation’s machine code is supposed to interact with the system at runtime. The following ABIs are supported:
• armeabi: ABI is for ARM-based CPUs that support at least the ARMv5TE instruction set.
• armeabi-v7a: This ABI extends armeabi to include several CPU instruction set extensions.
• arm64-v8a: ABI for ARMv8-based CPUs that support AArch64, the new 64-bit ARM architec- ture.
Most disassemblers can handle any of those architectures. Below, we’ll be viewing the armeabi- v7a version (located inHelloWord-JNI/lib/armeabi-v7a/libnative-lib.so) in radare2 and in IDA Pro. See the section “Reviewing Disassembled Native Code” below to learn on how to proceed when inspecting the disassembled native code.
radare2
To open the file in radare2 you only have to run r2 -A HelloWord-JNI/lib/armeabi-v7a/
libnative-lib.so. The chapter “Android Basic Security Testing” already introduced radare2.
Remember that you can use the flag-Ato run theaaacommand right after loading the binary in order to analyze all referenced code.
$ r2-AHelloWord-JNI/lib/armeabi-v7a/libnative-lib.so
[x] Analyze all flags starting with sym. and entry0(aa) [x] Analyze function calls(aac)
[x] Analyze len bytes of instructions for references(aar) [x] Check for objc references
[x] Check for vtables
[x] Finding xrefs in noncode section with anal.in=io.maps [x] Analyze value pointers(aav)
[x] Value from 0x00000000 to 0x00001dcf(aav) [x] 0x00000000-0x00001dcf in 0x0-0x1dcf(aav) [x] Emulate code to find computed references(aae) [x] Type matching analysis for all functions(aaft)
[x] Use-AAor aaaa to perform additional experimental analysis.
-- Print the contents of the current block with the'p'command [0x00000e3c]>
Note that for bigger binaries, starting directly with the flag-Amight be very time consuming as well as unnecessary. Depending on your purpose, you may open the binary without this option and then apply a less complex analysis like aaor a more concrete type of analysis such as the ones offered inaa (basic analysis of all functions) oraac(analyze function calls). Remember to
always type? to get the help or attach it to commands to see even more command or options.
For example, if you enteraa? you’ll get the full list of analysis commands.
[0x00001760]>aa?
Usage: aa[0*?] # see also 'af' and 'afna'
|aa alias for'af@@ sym.*;af@entry0;afva'
|aaa[?] autoname functions after aa(see afna)
|aab abb across bin.sections.rx
|aac [len] analyze function calls(af @@`pi len~call[1]`)
|aac* [len] flag function calls without performing a complete analysis
|aad [len] analyze data references to code
|aae [len]([addr]) analyze references with ESIL(optionally to address)
|aaf[e|t] analyze all functions(e anal.hasnext=1;afr @@c:isq) (aafe=aef@@f)
|aaF [sym*] set anal.in=block for all the spaces between flags matching glob
|aaFa [sym*] same as aaF but uses af/a2f instead of af+/afb+(slower but more accurate)
|aai[j] show info of all analysis parameters
|aan autoname functions that either start with fcn.*or sym.func.*
|aang find function and symbol names from golang binaries
|aao analyze all objc references
|aap find and analyze function preludes
|aar[?][len] analyze len bytes of instructions for references
|aas [len] analyze symbols(af @@=`isq~[0]`)
|aaS analyze all flags starting with sym.(af @@ sym.*)
|aat [len] analyze all consecutive functions in section
|aaT [len] analyze code after trap-sleds
|aau [len] list mem areas(larger than len bytes)not covered by functions
|aav [sat] find values referencing a specific section or map
There is a thing that is worth noticing about radare2 vs other disassemblers like e.g. IDA Pro. The following quote from thisarticleof radare2’s blog (http://radare.today/) offers a good summary.
Code analysis is not a quick operation, and not even predictable or taking a linear time to be processed. This makes starting times pretty heavy, compared to just loading the headers and strings information like it’s done by default.
People that are used to IDA or Hopper just load the binary, go out to make a coffee and then when the analysis is done, they start doing the manual analysis to understand what the program is doing. It’s true that those tools perform the analysis in background, and the GUI is not blocked. But this takes a lot of CPU time, and r2 aims to run in many more platforms than just high-end desktop computers.
This said, please see section “Reviewing Disassembled Native Code” to learn more bout how radare2 can help us performing our reversing tasks much faster. For example, getting the disas- sembly of an specific function is a trivial task that can be performed in one command.
IDA Pro
If you own anIDA Prolicense, open the file and once in the “Load new file” dialog, choose “ELF for ARM (Shared Object)” as the file type (IDA should detect this automatically), and “ARM Little- Endian” as the processor type.
The freeware version of IDA Pro unfortunately does not support the ARM processor type.