Software Execution Environments (Virtual Machines)

Some software development platforms don’t produce executable machine code that directly runs on a processor. Instead, they generate some kind of intermediate representation of the program, or bytecode. This bytecode is then read by a special program on the user’s machine, which executes the program on the local processor. This program is called a virtual machine. Virtual machines are always processor-specific, meaning that a specific virtual machine only runs on a specific platform. However, many bytecode formats have multiple virtual machines that allow running the same bytecode program on different platforms.

Two common virtual machine architectures are the Java Virtual Machine (JVM) that runs Java programs, and the Common Language Runtime (CLR) that runs Microsoft .NET applications.

Programs that run on virtual machines have several significant benefits compared to native programs executed directly on the underlying hardware:

■■ Platform isolation: Because the program reaches the end user in a generic representation that is not machine-specific, it can theoretically be executed on any computer platform for which a compatible execution environment exists. The software vendor doesn’t have to worry about platform compatibility issues (at least theoretically)—the execution environment stands between the program and the system and encapsulates any platform-specific aspects.

60 Chapter 2

■■ Enhanced functionality: When a program is running under a virtual machine, it can (and usually does) benefit from a wide range of

enhanced features that are rarely found on real silicon processors. This can include features such as garbage collection, which is an automated system that tracks resource usage and automatically releases memory objects once they are no longer in use. Another prominent feature is runtime type safety: because virtual machines have accurate data type information on the program being executed, they can verify that type safety is maintained throughout the program. Some virtual machines can also track memory accesses and make sure that they are legal.

Because the virtual machine knows the exact length of each memory block and is able to track its usage throughout the application, it can easily detect cases where the program attempts to read or write beyond the end of a memory block, and so on.

Bytecodes

The interesting thing about virtual machines is that they almost always have their own bytecode format. This is essentially a low-level language that is just like a hardware processor’s assembly language (such as the IA-32 assembly language). The difference of course is in how such binary code is executed.

Unlike conventional binary programs, in which each instruction is decoded and executed by the hardware, virtual machines perform their own decoding of the program binaries. This is what enables such tight control over every- thing that the program does; because each instruction that is executed must pass through the virtual machine, the VM can monitor and control any opera- tions performed by the program.

The distinction between bytecode and regular processor binary code has slightly blurred during the past few years. Several companies have been developing bytecode processors that can natively run bytecode languages, which were previously only supported on virtual machines. In Java, for example, there are companies such as Imsys and aJile that offer “direct execution processors” that directly execute the Java bytecode without the use of a virtual machine.

Interpreters

The original approach for implementing virtual machines has been to use interpreters. Interpreters are programs that read a program’s bytecode exe-

Low-Level Software 61

06_574817 ch02.qxd 3/16/05 8:35 PM Page 61

cutable and decipher each instruction and “execute” it in a virtual environment implemented in software. It is important to understand that not only are these instructions not directly executed on the host processor, but also that the data accessed by the bytecode program is managed by the interpreter. This means that the bytecode program would not have direct access to the host CPU’s registers. Any “registers” accessed by the bytecode would usually have to be mapped to memory by the interpreter.

Interpreters have one major drawback: performance. Because each instruction is separately decoded and executed by a program running under the real CPU, the program ends up running significantly slower than it would were it running directly on the host’s CPU. The reasons for this become obvious when one considers the amount of work the interpreter must carry out in order to execute a single high-level bytecode instruction.

For each instruction, the interpreter must jump to a special function or code area that deals with it, determine the involved operands, and modify the system state to reflect the changes. Even the best implementation of an interpreter still results in each bytecode instruction being translated into dozens of instructions on the physical CPU. This means that interpreted programs run orders of magnitude slower than their compiled counterparts.

Just-in-Time Compilers

Modern virtual machine implementations typically avoid using interpreters because of the performance issues described above. Instead they employ just- in-time compilers, or JiTs. Just-in-time compilation is an alternative approach for running bytecode programs without the performance penalty associated with interpreters.

The idea is to take snippets of program bytecode at runtime and compile them into the native processor’s machine language before running them.

These snippets are then executed natively on the host’s CPU. This is usually an ongoing process where chunks of bytecode are compiled on demand, when- ever they are required (hence the term just-in-time).

Reversing Strategies

Reversing bytecode programs is often an entirely different experience compared to that of conventional, native executable programs. First and foremost, most bytecode languages are far more detailed compared to their native machine code counterparts. For example, Microsoft .NET executables contain highly detailed data type information called metadata. Metadata provides information on classes, function parameters, local variable types, and much more.

62 Chapter 2

Having this kind of information completely changes the reversing experience because it brings us much closer to the original high-level representation of the program. In fact, this information allows for the creation of highly effec- tive decompilers that can reconstruct remarkably readable high-level language representations from bytecode executables. This situation is true for both Java and .NET programs, and it presents a problem to software vendors working on those platforms, who have a hard time protecting their executables from being easily reverse engineered. The solution in most cases is to use obfuscators—programs that try to eliminate as much sensitive information from the executable as possible (while keeping it functional).

Depending on the specific platform and on how aggressively an executable is obfuscated, reversers have two options: they can either use a decompiler to reconstruct a high-level representation of the target program or they can learn the native low-level language in which the program is presented and simply read that code and attempt to determine the program’s design and purpose.

Luckily, these bytecode languages are typically fairly easy to deal with because they are not as low-level as the average native processor assembly language.

Chapter 12 provides an introduction to Microsoft’s .NET platform and to its native language, the Microsoft Intermediate Language (MSIL), and demonstrates how to reverse programs written for the .NET platform.

Dalam dokumen The Hacker's Guide to Reverse Engineering (Halaman 90-93)