# Out-of-Order CPU Core Microarchitecture
A structural diagram showing the internal pipeline stages of a modern superscalar out-of-order CPU core. Demonstrates multi-stage vertical flow with parallel paths, fan-out patterns for execution ports, and a separate memory hierarchy sidebar.
## Key Patterns Used
- **Multi-stage vertical flow**: Six pipeline stages (Front End → Rename → Schedule → Execute → Retire)
- **Parallel decode paths**: Main decode and µop cache bypass (dashed line for cache hit)
- **Container grouping**: Logical stages grouped in colored containers
- **Fan-out pattern**: Single scheduler dispatching to 6 execution ports
- **Sidebar layout**: Memory hierarchy placed in separate column on right
- **Stage labels**: Left-aligned labels indicating pipeline phase
- **Color-coded semantics**: Different colors for each functional unit category
## Diagram Type
This is a **hybrid structural/flow** diagram:
- **Flow aspect**: Instructions move top-to-bottom through pipeline stages
- **Structural aspect**: Components are grouped by function (rename unit, execution cluster)
- **Sidebar**: Memory hierarchy is architecturally separate but connected via data paths
## Pipeline Stage Breakdown
### Front End (Purple)
```xml
Fetch unit
6-wide, 32B/cycle
Branch predictor
Decode
x86 → µops, 6-wide
```
### µop Cache Bypass Path (Teal)
The µop cache (Decoded Stream Buffer) provides an alternate path that bypasses the complex decoder:
```xml
µop cache (DSB)
4K entries, 8-wide
hit
```
### Rename/Allocate Container (Coral)
Groups related rename components in a container:
```xml
Rename / allocate
Map architectural → physical registers
Register alias table
180 physical regs
```
### Scheduler Fan-Out Pattern (Amber → Teal)
Single unified scheduler dispatching to multiple execution ports:
```xml
Unified scheduler
97 entries, out-of-order dispatch
```
### Execution Port Box Pattern
Compact boxes showing port number and capabilities:
```xml
Port 0
ALU
DIV
```
### Reorder Buffer (Pink)
Wide horizontal bar at bottom showing retirement:
```xml
Reorder buffer (ROB) — 512 entries, 8-wide retire
```
### Memory Hierarchy Sidebar (Blue)
Separate column showing cache levels:
```xml
Memory hierarchy
L1-I cache
32 KB, 8-way
```
## Connection Patterns
### Instruction Fetch Path
Horizontal arrow from L1-I cache to fetch unit:
```xml
instruction fetch
```
### Load/Store Path
Complex path from execution ports to L1-D cache:
```xml
load / store
```
### Commit Path (dashed)
Dashed line showing write-back from ROB to register file:
```xml
commit
```
### Path Merge (Decode + µop Cache)
Two paths converging before rename:
```xml
```
## Text Classes
This diagram uses an additional text class for very small labels:
```css
.tx { font-family: system-ui, -apple-system, sans-serif; font-size: 10px; fill: var(--text-secondary); }
```
Used for:
- Execution port capability labels (ALU, Branch, Load, etc.)
- Connection labels (instruction fetch, load/store, commit)
- DRAM latency annotation
## Color Semantic Mapping
| Color | Stage | Components |
|-------|-------|------------|
| `c-purple` | Front end | Fetch, Branch predictor, Decode |
| `c-teal` | Execution | µop cache, Execution ports |
| `c-coral` | Rename | RAT, Physical RF, Free list |
| `c-amber` | Schedule | Unified scheduler |
| `c-pink` | Retire | Reorder buffer |
| `c-blue` | Memory | L1-I, L1-D, L2, DRAM |
| `c-gray` | External | Off-chip DRAM |
## Layout Notes
- **ViewBox**: 820×720 (taller than wide for vertical pipeline flow)
- **Main pipeline**: x=40 to x=570 (530px width)
- **Memory sidebar**: x=600 to x=790 (190px width)
- **Stage labels**: x=30, left-aligned, 50% opacity
- **Vertical spacing**: ~80-100px between major stages
- **Container padding**: 20px inside containers
- **Port spacing**: 80px between execution port centers
- **Legend**: Bottom-right of memory sidebar, explains color coding
## Architectural Details Shown
| Component | Specification | Notes |
|-----------|---------------|-------|
| Fetch | 6-wide, 32B/cycle | Typical modern Intel/AMD |
| Decode | 6-wide, x86→µops | Complex decoder |
| µop Cache | 4K entries, 8-wide | Bypass for hot code |
| RAT | 180 physical regs | Supports deep OoO |
| Scheduler | 97 entries | Unified RS |
| Execution | 6 ports | ALU×2, Load, Store×2, Vector |
| ROB | 512 entries, 8-wide | In-order retirement |
| L1-I | 32 KB, 8-way | Instruction cache |
| L1-D | 48 KB, 12-way | Data cache |
| L2 | 1.25 MB, 20-way | Unified |
| DRAM | DDR5-6400, ~80ns | Off-chip |
## When to Use This Pattern
Use this diagram style for:
- CPU/GPU microarchitecture visualization
- Compiler pipeline stages
- Network packet processing pipelines
- Any system with parallel execution units fed by a scheduler
- Hardware designs with multiple functional units