• Tidak ada hasil yang ditemukan

Practical Reverse Engineering

N/A
N/A
Protected

Academic year: 2023

Membagikan "Practical Reverse Engineering"

Copied!
383
0
0

Teks penuh

Limitation of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties as to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including but not limited to warranties of fitness for a particular purpose . This work is sold with the understanding that the publisher is not engaged in providing legal, accounting, or other professional services.

About the Technical Editor

Matt Miller is a principal security engineer in Microsoft's Trustworthy Computing organization, where he currently focuses on the research and development of exploit mitigation technology. Prior to joining Microsoft, Matt was the lead developer for the Metasploit framework and a contributor to Uninformed magazine, where he wrote on topics related to exploits, reverse engineering, program analysis, and operating system internals.

Credits

The book represents something we wish we had when we started learning about reverse engineering more than 15 years ago. All authors would like to thank Rolf Rolles for his contributions to the eclipse chapter.

Acknowledgments

Contents at a Glance

Contents

Investigate and expand your knowledge 182 Analysis of real drivers 184 Chapter 4 Debugging and automation 187 The debugging tools and basic commands 188 .

Introduction

At this point, you will have all the background necessary to read and understand Chapter 3 "The Windows Kernel." You should also consider learning Win32 programming. With this guide and the subsequent chapters, you should be well prepared for the learning journey.

For the purpose of our chapter, x86 is the 32-bit implementation of the Intel architecture (IA-32) as defined in the Intel Software Development Manual. Protected mode is the processor state that supports virtual memory, paging, and other functions; it is the state in which modern operating systems run.

Register Set and Data Types

They are accessible only to code running in ring 0 and are typically used to store special counters and implement low-level functionality. For example, the SYSENTER statement transfers execution to the address stored in the IA32_SYSENTER_EIP MSR (0x176), which is usually the operating system's system call handler.

Instruction Set

For example, CR0 determines whether paging is on or off, CR2 contains the linear address that caused a page error, CR3 is the base address of a paging data structure, and CR4 controls hardware virtualization settings. A classic RISC architecture such as ARM can only read/write data from/to memory with load/save instructions (LDR and STR respectively);

Syntax

Another important characteristic is that x86 uses variable-length instruction size: the instruction length can vary from 1 to 15 bytes. Disassemblers/assemblers and other reverse engineering tools (IDA Pro, OllyDbg, MASM, etc.) on Windows typically use the Intel notation, while those on UNIX often follow the AT&T (GCC) notation.

Data Movement

EBX is probably the base address of another structure of the same type 03: A5 movsd. STOS is the same as SCAS except that it writes the value AL/AX/EAX to EDI.

Exercise

Arithmetic Operations

For example, dividing integers is a relatively slow operation, but if the divisor is a power of two, it can be reduced to shifting bits to the right; 100/2 is the same as 100>>1. Similarly, multiplication by a power of two can be reduced to shifting bits to the left; 100*2 is the same as 100<<1.

Stack Operations and Function Invocation

It pushes the return address (address immediately after the CALL statement) onto the stack. At this point, the top of the stack contains the return address stored by the CALL statement at 0x4129F9.

Figure 1-3 illustrates the stack layout.
Figure 1-3 illustrates the stack layout.

Exercises

This two-instruction sequence is commonly referred to as the function epilogue because it is at the end of the function and restores the previous function frame. If the function addme had local variables, the code would have to grow the stack by subtracting ESP after line 2.

Control Flow

Lines 8–9 check whether the length of the string is less than or equal to zero. Lines 15–16 check whether EDX is less than the length of the string; if so, execution returns to the beginning of the loop.

Table 1-3: Common Conditional Codes CONDITIONAL
Table 1-3: Common Conditional Codes CONDITIONAL

System Mechanism

As an exercise, you should decompile this function so that it looks more "natural" (as opposed to our literal translation). Besides the normal Jcc constructs, certain loops can be implemented using the LOOP instruction.

Address Translation

According to the documentation, the lower 12 bits of a PDPT entry are flag/reserved bits, and the remaining bits are used as the physical address of the PD base. After the whole process, it is determined that the virtual address 0xBF80EE6B translates to the physical address 0x6694E6B.

Interrupts and Exceptions

Hopefully, the next time your program accesses memory, you'll appreciate the processor more. When the processor receives an interrupt, it executes the function at the index associated with the interrupt and resumes execution where it was before the interrupt occurred.

Walk-Through

If it is, the return value is set to 0 and returned (lines 30–34); otherwise, execution continues at line 35. If it is, the return value is set to 1 and returned; otherwise, execution continues at line 86. In_opt_ LPSECURITY_ATTRIBUTES lpThreadAttributes, _In_ SIZE_T dwStackSize,.

Canonical Address

Function Invocation

This architecture was useful beyond their limited product line, so a company called ARM Holdings was formed to license the architecture for use in a wide variety of products. The first version of the architecture was introduced in 1985, and at the time of this writing it is at version 7 (ARMv7).

Basic Features

Additional extensions can be added to the processor; eg. enables the Jazelle extension to execute Java bytecode natively on the processor. For example, some instructions are available in ARM mode but not Thumb mode, and vice versa.

Data Types and Registers

Similar to other architectures, ARM stores information about the current execution state in the Current Program State Register (CPSR). One way to explicitly switch from Thumb to ARM (and vice versa) is to change this bit.

System-Level Controls and Settings

Introduction to the Instruction Set

Some will display STMEA if the base register is SP, and STMIA for other registers; some always use STM; and some always use STMIA.

Loading and Storing Data

LDR and STR

All of the preceding offset examples use offset addressing mode, which means that the base register is never changed. The post-indexed address mode means that the base register is used as the end address and then updated with the calculated offset.

Other Usage for LDR

Another related instruction is ADR, which gets the address of a label/function and puts it in a register. Internally, this instruction simply calculates an offset from PC and stores it in the destination register.

LDM and STM

Because LDM and STM can move multiple words at once, they are typically used in block copy or move operations. They are simply pseudo-instructions for LDM/STM instructions in various modes (IA, IB, etc.).

PUSH and POP

Functions and Function Invocation

It is similar to B in that it transfers control to a target, but has the ability to switch between the ARM/Thumb state and the target address is stored in a register. It is similar to B except that it also stores the return address in LR before transferring control to the target offset.

Branching and Conditional Execution

The suffix “S” indicates that the statement should set arithmetic conditional flags (zero, negative, etc.) depending on the result. By default, statements do not update the conditional flags unless the "S" suffix is ​​used; the compare statements (CBZ, CMP, TST, CMN, and TEQ) automatically update the flags because they are usually used before branching statements.

Table 2-1: Conditional code and meaning
Table 2-1: Conditional code and meaning

Thumb State

You have seen that the branch instruction (B) can be made to do conditional branches by adding a suffix (BEQ, BLE, BLT, BLS, etc.). Because of its flexibility, the IT instruction can be used to reduce the number of instructions needed to implement short conditions in Thumb state.

Switch-Case

In Thumb mode, the same concept applies, except that the jump table contains offsets instead of addresses. As in the previous example, the jump table is usually placed after the TBB/TBH statement.

Miscellaneous

Just-in-Time and Self-Modifying Code

Synchronization Primitives

These are barrier instructions that ensure that memory accesses and instruction fetches are synchronized before the subsequent instructions are executed. For this reason, you will often see these instructions used in code that implements locks.

System Services and Mechanisms

For example, on Linux x86, you can use interrupt 0x80 or the special instruction SYSENTER to perform a system call; on x64 this is provided by the SYSCALL instruction.). On ARM, there is no dedicated system call instruction, so software interrupt is used to implement syscalls.

Instructions

You can infer that the field type is short because of the LDRH instruction (loads a half word). The base address of the array is R3 on line 33 because it is being indexed by R2.

Figure 2-7 illustrates the relationships between the four structures.
Figure 2-7 illustrates the relationships between the four structures.

Next Steps

It is recommended that you write comments and notes, and draw connections between branches/labels, on the exercise itself. Not all structure fields will be able to be restored because the function may only have access to a few fields.

The Windows Kernel

If the process of reverse engineering Windows drivers could be modeled as a discrete task, 90% would understand how Windows works and 10% would understand the assembly code. It then explains concepts such as threads, processes, memory, interrupts and how they are used in the kernel and drivers.

Windows Fundamentals

Therefore, this chapter is written as an introduction to the Windows kernel for reverse engineers.

Memory Layout

With /3GB, the user address space increases to 3GB and the remaining 1GB is for the kernel. This region, usually referred to as the no-access region, is there so that the kernel does not accidentally cross the address boundary and corrupt user-mode memory.

Processor Initialization

N O T E It is possible to change this default behavior by specifying the /3GB switch in the boot options. It is stored in the FS segment (x86), the GS segment (x64), or in one of the system coprocessor (ARM) registers.

System Calls

However, on x64 and ARM it is a series of 32-bit integers that encode the system call offset and number of arguments sent on the stack. NtCreateFile sets EAX to 0x42 because this is the system call number for NtCreateFile in the kernel.

Figure 3-3 illustrates the IDT on x86.
Figure 3-3 illustrates the IDT on x86.

Interrupt Request Level

APC-LEVEL (1) — This is the IRQL at which asynchronous procedure calls (APCs) are executed. See the section "Asynchronous Procedure Calls."). Thread dispatcher and deferred procedure calls (DPCs) run at this IRQL. See the "Deferred Procedure Calls" section.") The code on this IRQL cannot wait.

Pool Memory

Memory Descriptor Lists

Suppose a driver needs to map memory in the kernel space to the user mode address space of a process or vice versa. Another scenario is when a driver needs to write to some read-only pages (such as those in the code section).

Processes and Threads

To achieve this, I would first initialize the MDL to describe the memory buffer (IoAllocateMdl), ensure that the current thread has access to those pages and lock them (MmProbeAndLockPages), and then map those pages to memory (MmMapLockedPagesSpecifyCache) in this process. NOTE Although we say they should only be accessed by documented kernel routines, real-world rootkits modify semi-documented or completely undocumented fields in these structures to achieve their goals.

Execution Context

When a user-mode application sends a request (IOCTL) to a driver, the driver's IOCTL handler runs in thread context (i.e., the context of the user-mode thread that initiated the request). APCs run in thread context (i.e., the context of the thread in which the APC was queued).

Kernel Synchronization Primitives

After initialization, they can be acquired and released through various APIs; see the Windows Driver Kit documentation for more information. After initialization, they can be acquired/released through various documented APIs; see the WDK documentation for more information.

Lists

While conceptually similar to mutexes, they are used to protect shared resources accessed at DISPATCH_LEVEL or higher IRQL. Therefore, you need to understand its implementation details and usage patterns so that you can recognize them at the assembly level.

Implementation Details

In practice, this macro usually takes the address from the LIST_ENTRY field in the list input. PsLoadedModuleList is the head of a list whose list entries are of type KLDR_DATA_TABLE_ENTRY.

Asynchronous and Ad-Hoc Execution

System Threads

Work Items

What actually happens is that each processor has different queues to store the work items and there is a system thread that pulls one item at a time from the queue for execution. Explain how we were able to determine that ExpWorkerThread is the system thread responsible for dequeuing and executing work items.

Asynchronous Procedure Calls

Due to their lightweight nature, enqueuing work items in a DPC is a common driver programming pattern. We used the same method to find out how the kernel sends work items.

Deferred Procedure Calls

Internally, the insertion/removal of DPCs from the DPC queue takes place on this field. The data is stored in the DpcData field of the KPRCB structure associated with the DPC.

Timers

When these routines are called, the timer is inserted into a timer table in the PRCB (TimerTable->TimerListEntry). Once set and queued, a timer can be canceled and thus removed from the timer table.

Process and Thread Callbacks

Identify other similar callbacks documented in the WDK and investigate how they work (processor, memory, and so on).

Completion Routines

For example, they can set a completion routine to modify the return buffer from a lower driver before returning it to user mode.

I/O Request Packets

The dynamic part is immediately after the head; it is an array of IO_STACK_LOCATION structures that contain device-specific request information. An IO_STACK_LOCATION contains the IRP's major and minor function, parameters for the request, and an optional completion routine.

Figure 3-8 illustrates the relationship between these two structures  in an IRP.
Figure 3-8 illustrates the relationship between these two structures in an IRP.

Structure of a Driver

Note that the "next" stack position is the element immediately above the "current" one (not after it). A driver can allocate an IRP with IoAllocateIrp, assign it to a thread, populate the IRP major and minor code, set IO_STACK_LOCATION count/size, populate parameters, and send it to the destination device for processing with IoCallDriver.

Entry Points

File system minifi lter driver—Drivers that interact with the file system to intercept file I/O requests. This is why drivers usually have to register dispatch routines with the I/O driver (see the next section).

Driver and Device Objects

A driver can "link" one of its own device objects to another device object so that it receives I/O requests intended for the target device object. The AttachedDevice field refers to the device to which the current device object is attached.

IRP Handling

This attachment mechanism is used to support filter managers so that they can modify/inspect requests to other managers.

A Common Mechanism for User-Kernel Communication

The driver accesses this kernel-mode buffer through the AssociatedIrp.SystemBuffer field in the IRP tree. When using this method, the I/O manager does not perform any validation on the user data; it passes the raw data to the driver.

Miscellaneous System Mechanisms

Also, the system call table (KiServiceTable) is not exported, so there is no easy way to access it from a driver. Another method is to separate the system call stub and retrieve the index from there.

Walk-Throughs

A file-backed partition is one whose memory contents are the contents of a file on disk; if there are changes to the section, they will be made directly to disk. A section supported by a page file is one whose content is supported by the page file; Changes to such a section will be discarded after its closure.

An x86 Rootkit

The first index is 0, which corresponds to IRP_MJ_CREATE; so you know that sub_10300 is the handler for that IRP. We can make an educated guess that this is the CurrentStackLocation field because of the code context (which occurs at the start of an IRP handler).

An x64 Rootkit

Gambar

Table 1-1: Some GPRs and Their Usage REGISTER PURPOSE
Figure 1-3 illustrates the stack layout.
Table 1-2: Calling Conventions
Table 1-3: Common Conditional Codes CONDITIONAL
+6

Referensi

Dokumen terkait

Whose and What Culture to Be Taught In relation to the fact that language is inseparable from culture, the crucial question in teaching English as a foreign language TEFL is ‘Whose