RePo: how transformers re-position tokens by meaning

Sentence: "The capital of France is Paris." The pos 1 capital pos 2 of pos 3 France pos 4 is pos 5 Paris pos 6 . 2 positions apart — model must "travel" to connect them The core problem Standard models assign position by word order (slot), not by meaning. Related tokens stay far apart in the attention math.
RoPE: each word gets a fixed integer — distance is always ordinal 1 2 3 4 5 6 The capital of France is Paris gap = 2 — fixed, ordinal, cannot compress Positions are static: 1, 2, 3, 4, 5, 6 — always integers, always ordinal. Semantically close words still appear far apart in the math.
Inside one transformer layer — RePo adds a GPS side-quest before attention Paris token LayerNorm stabilize h extracted here MLP GPS factory 4096 → 64 → p p (64d) position essence Attention Q · K / V used to rotate Q and K (RoPE) h knows context (layer 12 "Paris" knows it's a capital). p encodes that as a decimal position.
Same p, different head weight → different decimal z per head p_Paris (64d) position essence w_A · p w_B · p Head A Geography specialist Head B Grammar specialist z = 2.1 z = 5.9 near France (2.0) near period (6.0)
Head A (geography lens) — semantic clustering in action Standard The·1 capital·2 of·3 France·4 is·5 Paris·6 gap = 2 RePo France·2.0 Paris·2.1 The·5.5 is·5.6 capital·5.7 gap = 0.1 Geography cluster — snapped together Filler words — pushed aside Words don't move in memory — z changes the RoPE angle so attention sees them as adjacent