# Out-of-Order CPU Core Microarchitecture A structural diagram showing the internal pipeline stages of a modern superscalar out-of-order CPU core. Demonstrates multi-stage vertical flow with parallel paths, fan-out patterns for execution ports, and a separate memory hierarchy sidebar. ## Key Patterns Used - **Multi-stage vertical flow**: Six pipeline stages (Front End → Rename → Schedule → Execute → Retire) - **Parallel decode paths**: Main decode and µop cache bypass (dashed line for cache hit) - **Container grouping**: Logical stages grouped in colored containers - **Fan-out pattern**: Single scheduler dispatching to 6 execution ports - **Sidebar layout**: Memory hierarchy placed in separate column on right - **Stage labels**: Left-aligned labels indicating pipeline phase - **Color-coded semantics**: Different colors for each functional unit category ## Diagram Type This is a **hybrid structural/flow** diagram: - **Flow aspect**: Instructions move top-to-bottom through pipeline stages - **Structural aspect**: Components are grouped by function (rename unit, execution cluster) - **Sidebar**: Memory hierarchy is architecturally separate but connected via data paths ## Pipeline Stage Breakdown ### Front End (Purple) ```xml Fetch unit 6-wide, 32B/cycle Branch predictor Decode x86 → µops, 6-wide ``` ### µop Cache Bypass Path (Teal) The µop cache (Decoded Stream Buffer) provides an alternate path that bypasses the complex decoder: ```xml µop cache (DSB) 4K entries, 8-wide hit ``` ### Rename/Allocate Container (Coral) Groups related rename components in a container: ```xml Rename / allocate Map architectural → physical registers Register alias table 180 physical regs ``` ### Scheduler Fan-Out Pattern (Amber → Teal) Single unified scheduler dispatching to multiple execution ports: ```xml Unified scheduler 97 entries, out-of-order dispatch ``` ### Execution Port Box Pattern Compact boxes showing port number and capabilities: ```xml Port 0 ALU DIV ``` ### Reorder Buffer (Pink) Wide horizontal bar at bottom showing retirement: ```xml Reorder buffer (ROB) — 512 entries, 8-wide retire ``` ### Memory Hierarchy Sidebar (Blue) Separate column showing cache levels: ```xml Memory hierarchy L1-I cache 32 KB, 8-way ``` ## Connection Patterns ### Instruction Fetch Path Horizontal arrow from L1-I cache to fetch unit: ```xml instruction fetch ``` ### Load/Store Path Complex path from execution ports to L1-D cache: ```xml load / store ``` ### Commit Path (dashed) Dashed line showing write-back from ROB to register file: ```xml commit ``` ### Path Merge (Decode + µop Cache) Two paths converging before rename: ```xml ``` ## Text Classes This diagram uses an additional text class for very small labels: ```css .tx { font-family: system-ui, -apple-system, sans-serif; font-size: 10px; fill: var(--text-secondary); } ``` Used for: - Execution port capability labels (ALU, Branch, Load, etc.) - Connection labels (instruction fetch, load/store, commit) - DRAM latency annotation ## Color Semantic Mapping | Color | Stage | Components | |-------|-------|------------| | `c-purple` | Front end | Fetch, Branch predictor, Decode | | `c-teal` | Execution | µop cache, Execution ports | | `c-coral` | Rename | RAT, Physical RF, Free list | | `c-amber` | Schedule | Unified scheduler | | `c-pink` | Retire | Reorder buffer | | `c-blue` | Memory | L1-I, L1-D, L2, DRAM | | `c-gray` | External | Off-chip DRAM | ## Layout Notes - **ViewBox**: 820×720 (taller than wide for vertical pipeline flow) - **Main pipeline**: x=40 to x=570 (530px width) - **Memory sidebar**: x=600 to x=790 (190px width) - **Stage labels**: x=30, left-aligned, 50% opacity - **Vertical spacing**: ~80-100px between major stages - **Container padding**: 20px inside containers - **Port spacing**: 80px between execution port centers - **Legend**: Bottom-right of memory sidebar, explains color coding ## Architectural Details Shown | Component | Specification | Notes | |-----------|---------------|-------| | Fetch | 6-wide, 32B/cycle | Typical modern Intel/AMD | | Decode | 6-wide, x86→µops | Complex decoder | | µop Cache | 4K entries, 8-wide | Bypass for hot code | | RAT | 180 physical regs | Supports deep OoO | | Scheduler | 97 entries | Unified RS | | Execution | 6 ports | ALU×2, Load, Store×2, Vector | | ROB | 512 entries, 8-wide | In-order retirement | | L1-I | 32 KB, 8-way | Instruction cache | | L1-D | 48 KB, 12-way | Data cache | | L2 | 1.25 MB, 20-way | Unified | | DRAM | DDR5-6400, ~80ns | Off-chip | ## When to Use This Pattern Use this diagram style for: - CPU/GPU microarchitecture visualization - Compiler pipeline stages - Network packet processing pipelines - Any system with parallel execution units fed by a scheduler - Hardware designs with multiple functional units