Книга: Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software
Назад: Reverse-Engineering
Дальше: Conclusion

. It has three hardware components:

  • The central processing unit (CPU) executes code.

  • The main memory of the system (RAM) stores all data and code.

  • An input/output system (I/O) interfaces with devices such as hard drives, keyboards, and monitors.

, the CPU contains several components: The control unit gets instructions to execute from RAM using a register (the instruction pointer), which stores the address of the instruction to execute. Registers are the CPU’s basic data storage units and are often used to save time so that the CPU doesn’t need to access RAM. The arithmetic logic unit (ALU) executes an instruction fetched from RAM and places the results in registers or memory. The process of fetching and executing instruction after instruction is repeated as a program runs.

.

shows the four major sections of memory in a particular order, these pieces may be located throughout memory. For example, there is no guarantee that the stack will be lower than the code or vice versa.

, the mnemonic is a word that identifies the instruction to execute, such as mov, which moves data. Operands are typically used to identify information used by the instruction, such as registers or data.

, you can see that the opcodes are B9 42 00 00 00 for the instruction mov ecx, 0x42. The value 0xB9 corresponds to mov ecx, and 0x42000000 corresponds to the value 0x42.

.

  • Register operands refer to registers, such as ecx in .

  • Memory address operands refer to a memory address that contains the value of interest, typically denoted by a value, register, or equation between brackets, such as [eax].

  • shows the most common x86 registers, which fall into the following four categories:

    • General registers are used by the CPU during execution.

    • Segment registers are used to track sections of memory.

    • Status flags are used to make decisions.

    • Instruction pointers are used to keep track of the next instruction to execute.

    You can use as a reference throughout this chapter to see how a register is categorized and broken down. The sections that follow discuss each of these register categories in depth.

    lists the possible references for each general register. The EAX register breakdown is illustrated in . In this example, the 32-bit (4-byte) register EAX contains the value 0xA9DC81F5, and code can reference the data inside EAX in three additional ways: AX (2 bytes) is 0x81F5, AL (1 byte) is 0xF5, and AH (1 byte) is 0x81.

    contains examples of the mov instruction. Operands surrounded by brackets are treated as memory references to data. For example, [ebx] references the data at the memory address EBX. The final example in uses an equation to calculate a memory address. This saves space, because it does not require separate instructions to perform the calculation contained within the brackets. Performing calculations such as this within an instruction is not possible unless you are calculating a memory address. For example, mov eax, ebx+esi*4 (without the brackets) is an invalid instruction.

    shows values for registers EAX and EBX on the left and the information contained in memory on the right. EBX is set to 0xB30040. At address 0xB30048 is the value 0x20. The instruction mov eax, [ebx+8] places the value 0x20 (obtained from memory) into EAX, and the instruction lea eax, [ebx+8] places the value 0xB30048 into EAX.

    shows examples of the addition and subtraction instructions.

    depicts the values in EDX and EAX when the decimal result of multiplication is 5,000,000,000 and is too large to fit in a single register.

    The div value instruction does the same thing as mul, except in the opposite direction: It divides the 64 bits across EDX and EAX by value. Therefore, the EDX and EAX registers must be set up appropriately before the division occurs. The result of the division operation is stored in EAX, and the remainder is stored in EDX.

    shows examples of the mul and div instructions. The instructions imul and idiv are the signed versions of the mul and div instructions.

    displays examples of these instructions.

    .

    ).

    The stack is used for short-term storage only. It frequently stores local variables, parameters, and the return address. Its primary usage is for the management of data exchanged between function calls. The implementation of this management varies among compilers, but the most common convention is for local variables and parameters to be referenced relative to EBP.

    we will explore alternatives.

    Many functions contain a prologue—a few lines of code at the start of the function. The prologue prepares the stack and registers for use within the function. In the same vein, an epilogue at the end of a function restores the stack and registers to their state before the function was called.

    The following list summarizes the flow of the most common implementation for function calls. A bit later, shows a diagram of the stack layout for an individual stack frame, which clarifies the organization of stacks.

    1. Arguments are placed on the stack using push instructions.

    2. A function is called using call memory_location. This causes the current instruction address (that is, the contents of the EIP register) to be pushed onto the stack. This address will be used to return to the main code when the function is finished. When the function begins, EIP is set to memory_location (the start of the function).

    3. shows how the stack is laid out in memory. Each time a call is performed, a new stack frame is generated. A function maintains its own stack frame until it returns, at which time the caller’s stack frame is restored and execution is transferred back to the calling function.

      shows a dissection of one of the individual stack frames from . The memory locations of individual items are also displayed. In this diagram, ESP would point to the top of the stack, which is the memory address 0x12F02C. EBP would be set to 0x12F03C throughout the duration of the function, so that the local variables and arguments can be referenced using EBP. The arguments that are pushed onto the stack before the call are , if the instruction push eax were executed, ESP would be decremented by four and would contain 0x12F028, and the data contained in EAX would be copied to 0x12F028. If the instruction pop ebx were executed, the data at 0x12F028 would be moved into the EBX register, and then ESP would be incremented by four.

      .)

      The x86 architecture provides additional instructions for popping and pushing, the most popular of which are pusha and pushad. These instructions push all the registers onto the stack and are commonly used with popa and popad, which pop all the registers off the stack. The pusha and pushad functions operate as follows:

      • pusha pushes the 16-bit registers on the stack in the following order: AX, CX, DX, BX, SP, BP, SI, DI.

      • pushad pushes the 32-bit registers on the stack in the following order: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI.

      shows how the cmp instruction impacts the flags.

      shows the most common conditional jump instructions and details of how they operate. Jcc is the shorthand for generally describing conditional jumps.

      .)

      The most common data buffer manipulation instructions are movsx, cmpsx, stosx, and scasx, where x = b, w, or d for byte, word, or double word, respectively. These instructions work with any type of data, but our focus in this section will be bytes, so we will use movsb, cmpsb, and so on.

      The ESI and EDI registers are used in these operations. ESI is the source index register, and EDI is the destination index register. ECX is used as the counting variable.

      These instructions require a prefix to operate on data lengths greater than 1. The movsb instruction will move only a single byte and does not utilize the ECX register.

      . Therefore, in most data buffer manipulation instructions, ESI, EDI, and ECX must be properly initialized for the rep instruction to be useful.

      displays some common rep instructions and describes their operation.

      shows the C code for a simple program.

      shows the C code from in compiled form. This example will help you understand how the parameters listed in Table 4-12 are accessed in assembly code. argc is compared to 3 at , and argv[1] is compared to -r at through the use of a strncmp. Notice how argv[1] is accessed: First the location of the beginning of the array is loaded into eax, and then 4 (the offset) is added to eax to get argv[1]. The number 4 is used because each entry in the argv array is an address to a string, and each address is 4 bytes in size on a 32-bit system. If -r is provided on the command line, the code starting at will be executed, which is when we see argv[2] accessed at offset 8 relative to argv and provided as an argument to the DeleteFileA function.

      . This set includes the following:

      Volume 1: Basic Architecture

      • This manual describes the architecture and programming environment. It is useful for helping you understand how memory works, including registers, memory layout, addressing, and the stack. This manual also contains details about general instruction groups.

      Volume 2A: Instruction Set Reference, A–M, and Volume 2B: Instruction Set Reference, N–Z

      • These are the most useful manuals for the malware analyst. They alphabetize the entire instruction set and discuss every aspect of each instruction, including the format of the instruction, opcode information, and how the instruction impacts the system.

      Volume 3A: System Programming Guide, , and Volume 3B: System Programming Guide,

      • In addition to general-purpose registers, x86 has many special-purpose registers and instructions that impact execution and support the OS, including debugging, memory management, protection, task management, interrupt and exception handling, multiprocessor support, and more. If you encounter special-purpose registers, refer to the System Programming Guide to see how they impact execution.

      Optimization Reference Manual

      • This manual describes code-optimization techniques for applications. It offers additional insight into the code generated by compilers and has many good examples of how instructions can be used in unconventional ways.

      Назад: Reverse-Engineering
      Дальше: Conclusion

      sss
      sss

      © RuTLib.com 2015-2018