The central processing unit (CPU) executes code.
The main memory of the system (RAM) stores all data and code.
An input/output system (I/O) interfaces with devices such as hard drives, keyboards, and monitors.
mov
, which moves data. Operands are typically used to identify information used by the instruction, such as registers or data.B9 42 00 00 00
for the instruction mov
ecx, 0x42
. The value 0xB9
corresponds to mov ecx
, and 0x42000000
corresponds to the value 0x42
.Register operands refer to registers, such as ecx
in .
Memory address operands refer to a memory address that contains the value of interest, typically denoted by a value, register, or equation between brackets, such as [eax]
.
General registers are used by the CPU during execution.
Segment registers are used to track sections of memory.
Status flags are used to make decisions.
Instruction pointers are used to keep track of the next instruction to execute.
You can use as a reference throughout this chapter to see how a register is categorized and broken down. The sections that follow discuss each of these register categories in depth.
mov
instruction. Operands surrounded by brackets are treated as memory references to data. For example, [ebx]
references the data at the memory address EBX. The final example in uses an equation to calculate a memory address. This saves space, because it does not require separate instructions to perform the calculation contained within the brackets. Performing calculations such as this within an instruction is not possible unless you are calculating a memory address. For example, mov e
ax, ebx+esi*4
(without the brackets) is an invalid instruction.0x20
. The instruction mov eax, [ebx+8]
places the value 0x20
(obtained from memory) into EAX, and the instruction lea eax, [ebx+8]
places the value 0xB30048
into EAX.The div
value
instruction does the same thing as mul
, except in the opposite direction: It divides the 64 bits across EDX and EAX by value
. Therefore, the EDX and EAX registers must be set up appropriately before the division occurs. The result of the division operation is stored in EAX, and the remainder is stored in EDX.
mul
and div
instructions. The instructions imul
and idiv
are the signed versions of the mul
and div
instructions.The stack is used for short-term storage only. It frequently stores local variables, parameters, and the return address. Its primary usage is for the management of data exchanged between function calls. The implementation of this management varies among compilers, but the most common convention is for local variables and parameters to be referenced relative to EBP.
Many functions contain a prologue—a few lines of code at the start of the function. The prologue prepares the stack and registers for use within the function. In the same vein, an epilogue at the end of a function restores the stack and registers to their state before the function was called.
The following list summarizes the flow of the most common implementation for function calls. A bit later, shows a diagram of the stack layout for an individual stack frame, which clarifies the organization of stacks.
Arguments are placed on the stack using push
instructions.
A function is called using call memory_location
. This causes the current instruction address (that is, the contents of the EIP register) to be pushed onto the stack. This address will be used to return to the main code when the function is finished. When the function begins, EIP is set to memory_location
(the start of the function).
shows how the stack is laid out in memory. Each time a call is performed, a new stack frame is generated. A function maintains its own stack frame until it returns, at which time the caller’s stack frame is restored and execution is transferred back to the calling function.
push eax
were executed, ESP would be decremented by four and would contain 0x12F028, and the data contained in EAX would be copied to 0x12F028. If the instruction pop ebx
were executed, the data at 0x12F028 would be moved into the EBX register, and then ESP would be incremented by four.The x86 architecture provides additional instructions for popping and pushing, the most popular of which are pusha
and pushad
. These instructions push all the registers onto the stack and are commonly used with popa
and popad
, which pop all the registers off the stack. The pusha
and pushad
functions operate as follows:
pusha
pushes the 16-bit registers on the stack in the following order: AX, CX, DX, BX, SP, BP, SI, DI.
pushad
pushes the 32-bit registers on the stack in the following order: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI.
shows how the cmp
instruction impacts the flags.
The most common data buffer manipulation instructions are movs
x
, cmps
x
, stos
x
, and scas
x
, where x
= b
, w
, or d
for byte, word, or double word, respectively. These instructions work with any type of data, but our focus in this section will be bytes, so we will use movsb
, cmpsb
, and so on.
The ESI and EDI registers are used in these operations. ESI is the source index register, and EDI is the destination index register. ECX is used as the counting variable.
These instructions require a prefix to operate on data lengths greater than 1. The movsb
instruction will move only a single byte and does not utilize the ECX register.
. Therefore, in most data buffer manipulation instructions, ESI, EDI, and ECX must be properly initialized for the rep
instruction to be useful.
rep
instructions and describes their operation.argc
is compared to 3
at ❶, and argv[1]
is compared to -r
at ❷ through the use of a strncmp
. Notice how argv[1]
is accessed: First the location of the beginning of the array is loaded into eax
, and then 4
(the offset) is added to eax
to get argv[1]
. The number 4
is used because each entry in the argv
array is an address to a string, and each address is 4 bytes in size on a 32-bit system. If -r
is provided on the command line, the code starting at ❸ will be executed, which is when we see argv[2]
accessed at offset 8
relative to argv
and provided as an argument to the DeleteFileA
function.Volume 1: Basic Architecture
This manual describes the architecture and programming environment. It is useful for helping you understand how memory works, including registers, memory layout, addressing, and the stack. This manual also contains details about general instruction groups.
Volume 2A: Instruction Set Reference, A–M, and Volume 2B: Instruction Set Reference, N–Z
These are the most useful manuals for the malware analyst. They alphabetize the entire instruction set and discuss every aspect of each instruction, including the format of the instruction, opcode information, and how the instruction impacts the system.
Volume 3A: System Programming Guide, , and Volume 3B: System Programming Guide,
In addition to general-purpose registers, x86 has many special-purpose registers and instructions that impact execution and support the OS, including debugging, memory management, protection, task management, interrupt and exception handling, multiprocessor support, and more. If you encounter special-purpose registers, refer to the System Programming Guide to see how they impact execution.
Optimization Reference Manual
This manual describes code-optimization techniques for applications. It offers additional insight into the code generated by compilers and has many good examples of how instructions can be used in unconventional ways.