🖥️ What is CPU Architecture?
The fundamental design of the processor - what instructions it understands, how much memory it can handle, and how it's organized internally.
A family of architectures from Intel and AMD. The name comes from processors 8086, 80286, 80386...
x86 = 32-bit | x64 (or x86-64, AMD64) = 64-bit
Almost all desktop computers and most servers use x86/x64. This is the architecture you'll encounter most often when doing RE on Windows software.
📦 Registers - The CPU's "Variables"
A small, extremely fast storage area inside the CPU itself. Used to store temporary data during calculations. Much faster than RAM.
General Purpose Registers
In 32-bit, there are 8 general purpose registers, each 32 bits (4 bytes) in size:
| Name | Full Name | Common Use |
|---|---|---|
EAX |
Extended Accumulator | Function return values, calculations |
EBX |
Extended Base | Data pointer, general use |
ECX |
Extended Counter | Loop counter, iterations |
EDX |
Extended Data | Extension for EAX, I/O operations |
ESI |
Extended Source Index | Source pointer in copy operations |
EDI |
Extended Destination Index | Destination pointer in copy operations |
EBP |
Extended Base Pointer | Points to base of Stack Frame |
ESP |
Extended Stack Pointer | Points to top of Stack |
Register Subdivision - 32/16/8 bit
Each 32-bit register is divided into smaller parts that can be accessed separately:
| Size | EAX | EBX | ECX | EDX |
|---|---|---|---|---|
| 32-bit | EAX |
EBX |
ECX |
EDX |
| 16-bit | AX |
BX |
CX |
DX |
| 8-bit high | AH |
BH |
CH |
DH |
| 8-bit low | AL |
BL |
CL |
DL |
In 64-bit (x64)
Registers were extended to 64 bits and 8 new ones were added:
RAX,RBX,RCX,RDX- extended versions of existingRSI,RDI,RBP,RSP- extended versions of existingR8-R15- 8 new registers!
32-bit: E prefix --> EAX, EBX, ECX, EDX...
64-bit: R prefix --> RAX, RBX, RCX, RDX...
16-bit: no prefix --> AX, BX, CX, DX...
🚩 FLAGS Register
A special register where each bit represents a status ("flag"). Flags change automatically after operations and are used for decisions (conditions, jumps).
The most important flags:
| Flag | Name | When set (=1) |
|---|---|---|
ZF |
Zero Flag | Result is 0 |
SF |
Sign Flag | Result is negative (top bit = 1) |
CF |
Carry Flag | Overflow in unsigned operation |
OF |
Overflow Flag | Overflow in signed operation |
CMP EAX, EBX ; Compare EAX to EBX (subtract without saving)
JE somewhere ; Jump if Equal - jumps if ZF=1 (they are equal)
📍 Instruction Pointer
Extended Instruction Pointer (32-bit) or RIP (64-bit).
A register that contains the address of the next instruction to be executed. Cannot be modified directly!
The CPU always looks at EIP/RIP to know where to read the next instruction from. After each instruction, EIP advances automatically.
⚙️ Instruction Cycle (Fetch-Decode-Execute)
This is how the CPU executes each instruction:
- Fetch: Read the instruction from the address EIP points to
- Decode: Parse what the instruction is and its parameters
- Execute: Perform the operation and update registers/memory
📊 Endianness - Byte Order
The order in which bytes are stored in memory.
Little Endian: The low byte is stored first at the lowest address (x86/x64 uses this!)
Big Endian: The high byte is stored first
What Does This Mean in Practice?
When we have a number that takes more than one byte (e.g., 4 bytes), the CPU needs to decide in what order to store it in memory. In Little Endian, the "little" (low) part of the number is stored first.
Let's say we have the value 0x12345678 (a 4-byte number) and we want to store it at address 0x100:
Breaking the number into 4 bytes:
0x12345678
││││││││
││││││└└── 0x78 (Least Significant Byte - LSB)
││││└└──── 0x56
││└└────── 0x34
└└──────── 0x12 (Most Significant Byte - MSB)
How it looks in memory (Little Endian):
Address: 0x100 0x101 0x102 0x103
Value: 0x78 0x56 0x34 0x12
↑
LSB is stored at the lowest address!
Note: Addresses 0x100-0x103 are just 4 consecutive memory cells - each cell holds one byte.
🔧 Why Does This Matter in RE?
When you look at memory in a debugger, you see the order that's stored in memory. But to understand the actual value that the CPU sees, you need to reverse the byte order!
You see this memory in the debugger:
0x401000: 78 56 34 12 48 65 6C 6C 6F 00
| Bytes | Type | What you see | Actual value |
|---|---|---|---|
78 56 34 12 |
4-byte integer | 78563412 | 0x12345678 (reverse!) |
48 65 6C 6C 6F 00 |
String | - | "Hello" (don't reverse!) |
📋 When to Reverse and When Not To?
| Data Type | Reverse? | Explanation |
|---|---|---|
| Pointers / Addresses | ✅ Yes | 4/8 byte number |
| Integers | ✅ Yes | 2/4/8 byte number |
| Strings | ❌ No | Each char is 1 byte - read left to right |
| Single byte | ❌ No | Nothing to reverse |
| Byte array | ❌ No | Read as you see it |
🔍 How to Identify the Data Type?
Good question! Everything looks like hex. Here are the main ways:
1. The instruction tells you the size:
mov eax, [0x401000] ; EAX = 32-bit → 4 bytes → reverse!
mov al, [0x401000] ; AL = 8-bit → 1 byte → don't reverse
mov rax, [0x401000] ; RAX = 64-bit → 8 bytes → reverse!
2. The instruction type reveals it's an address:
call [0x401000] ; CALL → this is a function address
jmp [0x401000] ; JMP → this is a jump address
lea eax, [0x401000] ; LEA = Load Address → this is an address
3. Identify Strings by pattern:
48 65 6C 6C 6F 00 → "Hello" + null terminator
│ │
│ └── 00 at end = end of string
└── Values 0x20-0x7E = readable ASCII characters
4. Identify Pointers by address range:
00 10 40 00 → reverse → 0x00401000 ← valid address range!
78 56 34 12 → reverse → 0x12345678 ← doesn't look like an address
Windows 32-bit addresses: usually in range 0x00400000 - 0x7FFFFFFF
You don't always know 100% what the data type is - and that's okay! Look at the instructions that use it, follow where the value goes, and learn from context. That's part of the art of RE.
What you see in memory: 78 56 34 12
↓
Reverse the byte order: 0x12345678
↓
This is the value the CPU actually sees and works with!
In short: Reversing bytes lets you "think like the CPU" - and that's exactly what we do in Reverse Engineering!
📋 Chapter Summary
- x86 = 32-bit, x64 = 64-bit
- General Purpose Registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
- EIP/RIP = pointer to next instruction
- FLAGS = flags indicating results (ZF, SF, CF, OF)
- Little Endian = low byte stored first. Reverse bytes of numbers and addresses to see what the CPU sees!
- Identify data type: Look at instruction (register size), operation type (call/jmp = address), and patterns (ASCII + null = string)
- Cycle: Fetch → Decode → Execute → Repeat