Chapter 3: x86/x64 Architecture

🖥️ What is CPU Architecture?

📖 Definition: CPU Architecture

The fundamental design of the processor - what instructions it understands, how much memory it can handle, and how it's organized internally.

📖 Definition: x86

A family of architectures from Intel and AMD. The name comes from processors 8086, 80286, 80386...
x86 = 32-bit | x64 (or x86-64, AMD64) = 64-bit

Almost all desktop computers and most servers use x86/x64. This is the architecture you'll encounter most often when doing RE on Windows software.

📦 Registers - The CPU's "Variables"

📖 Definition: Register

A small, extremely fast storage area inside the CPU itself. Used to store temporary data during calculations. Much faster than RAM.

General Purpose Registers

In 32-bit, there are 8 general purpose registers, each 32 bits (4 bytes) in size:

Name	Full Name	Common Use
`EAX`	Extended Accumulator	Function return values, calculations
`EBX`	Extended Base	Data pointer, general use
`ECX`	Extended Counter	Loop counter, iterations
`EDX`	Extended Data	Extension for EAX, I/O operations
`ESI`	Extended Source Index	Source pointer in copy operations
`EDI`	Extended Destination Index	Destination pointer in copy operations
`EBP`	Extended Base Pointer	Points to base of Stack Frame
`ESP`	Extended Stack Pointer	Points to top of Stack

Register Subdivision - 32/16/8 bit

Each 32-bit register is divided into smaller parts that can be accessed separately:

Size	EAX	EBX	ECX	EDX
32-bit	`EAX`	`EBX`	`ECX`	`EDX`
16-bit	`AX`	`BX`	`CX`	`DX`
8-bit high	`AH`	`BH`	`CH`	`DH`
8-bit low	`AL`	`BL`	`CL`	`DL`

In 64-bit (x64)

Registers were extended to 64 bits and 8 new ones were added:

RAX, RBX, RCX, RDX - extended versions of existing
RSI, RDI, RBP, RSP - extended versions of existing
R8-R15 - 8 new registers!

💡 How to identify 32 vs 64 bit

32-bit: E prefix  --> EAX, EBX, ECX, EDX...
64-bit: R prefix  --> RAX, RBX, RCX, RDX...
16-bit: no prefix --> AX, BX, CX, DX...

🚩 FLAGS Register

📖 Definition: FLAGS Register

A special register where each bit represents a status ("flag"). Flags change automatically after operations and are used for decisions (conditions, jumps).

The most important flags:

Flag	Name	When set (=1)
`ZF`	Zero Flag	Result is 0
`SF`	Sign Flag	Result is negative (top bit = 1)
`CF`	Carry Flag	Overflow in unsigned operation
`OF`	Overflow Flag	Overflow in signed operation

💡 Example: Using Flags

CMP EAX, EBX    ; Compare EAX to EBX (subtract without saving)
JE  somewhere   ; Jump if Equal - jumps if ZF=1 (they are equal)

📍 Instruction Pointer

📖 Definition: EIP / RIP

Extended Instruction Pointer (32-bit) or RIP (64-bit).
A register that contains the address of the next instruction to be executed. Cannot be modified directly!

The CPU always looks at EIP/RIP to know where to read the next instruction from. After each instruction, EIP advances automatically.

⚙️ Instruction Cycle (Fetch-Decode-Execute)

This is how the CPU executes each instruction:

Fetch: Read the instruction from the address EIP points to
Decode: Parse what the instruction is and its parameters
Execute: Perform the operation and update registers/memory

📊 Endianness - Byte Order

📖 Definition: Endianness

The order in which bytes are stored in memory.
Little Endian: The low byte is stored first at the lowest address (x86/x64 uses this!)
Big Endian: The high byte is stored first

What Does This Mean in Practice?

When we have a number that takes more than one byte (e.g., 4 bytes), the CPU needs to decide in what order to store it in memory. In Little Endian, the "little" (low) part of the number is stored first.

💡 Example: Little Endian

Let's say we have the value 0x12345678 (a 4-byte number) and we want to store it at address 0x100:

Breaking the number into 4 bytes:

0x12345678
  ││││││││
  ││││││└└── 0x78 (Least Significant Byte - LSB)
  ││││└└──── 0x56
  ││└└────── 0x34
  └└──────── 0x12 (Most Significant Byte - MSB)

How it looks in memory (Little Endian):

Address: 0x100  0x101  0x102  0x103
Value:   0x78   0x56   0x34   0x12
         ↑
         LSB is stored at the lowest address!

Note: Addresses 0x100-0x103 are just 4 consecutive memory cells - each cell holds one byte.

🔧 Why Does This Matter in RE?

When you look at memory in a debugger, you see the order that's stored in memory. But to understand the actual value that the CPU sees, you need to reverse the byte order!

💡 Practical Example

You see this memory in the debugger:

0x401000: 78 56 34 12 48 65 6C 6C 6F 00

Bytes	Type	What you see	Actual value
`78 56 34 12`	4-byte integer	78563412	0x12345678 (reverse!)
`48 65 6C 6C 6F 00`	String	-	"Hello" (don't reverse!)

📋 When to Reverse and When Not To?

Data Type	Reverse?	Explanation
Pointers / Addresses	✅ Yes	4/8 byte number
Integers	✅ Yes	2/4/8 byte number
Strings	❌ No	Each char is 1 byte - read left to right
Single byte	❌ No	Nothing to reverse
Byte array	❌ No	Read as you see it

🔍 How to Identify the Data Type?

Good question! Everything looks like hex. Here are the main ways:

1. The instruction tells you the size:

mov eax, [0x401000]    ; EAX = 32-bit  →  4 bytes  →  reverse!
mov al, [0x401000]     ; AL = 8-bit    →  1 byte   →  don't reverse
mov rax, [0x401000]    ; RAX = 64-bit  →  8 bytes  →  reverse!

2. The instruction type reveals it's an address:

call [0x401000]        ; CALL → this is a function address
jmp [0x401000]         ; JMP → this is a jump address
lea eax, [0x401000]    ; LEA = Load Address → this is an address

3. Identify Strings by pattern:

48 65 6C 6C 6F 00  →  "Hello" + null terminator
│                │
│                └── 00 at end = end of string
└── Values 0x20-0x7E = readable ASCII characters

4. Identify Pointers by address range:

00 10 40 00  →  reverse  →  0x00401000  ← valid address range!
78 56 34 12  →  reverse  →  0x12345678  ← doesn't look like an address

Windows 32-bit addresses: usually in range 0x00400000 - 0x7FFFFFFF

💡 Practical Tip

You don't always know 100% what the data type is - and that's okay! Look at the instructions that use it, follow where the value goes, and learn from context. That's part of the art of RE.

🎯 The Simple Rule

What you see in memory:    78 56 34 12
                              ↓
Reverse the byte order:    0x12345678
                              ↓
This is the value the CPU actually sees and works with!

In short: Reversing bytes lets you "think like the CPU" - and that's exactly what we do in Reverse Engineering!

📋 Chapter Summary

x86 = 32-bit, x64 = 64-bit
General Purpose Registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
EIP/RIP = pointer to next instruction
FLAGS = flags indicating results (ZF, SF, CF, OF)
Little Endian = low byte stored first. Reverse bytes of numbers and addresses to see what the CPU sees!
Identify data type: Look at instruction (register size), operation type (call/jmp = address), and patterns (ASCII + null = string)
Cycle: Fetch → Decode → Execute → Repeat

← Chapter 2: Numbers Chapter 4: Memory Management →