I’ve conducted a technical audit on the physical limits of H100 GPUs for long-context retrieval. At scale (N=500k+), standard fp16 NxN materialization is physically impossible on a single 80GB card (requires ~500GB HBM).
I’m releasing CTDR (Cold Tensor Deterministic Reasoning) evidence and the Maxwell Dashboard. It allows you to: 1. Visualize the OOM Feasibility Boundary (The NxN Wall). 2. Audit real NVML energy receipts (Joules per query) showing 90.4% SM utilization. 3. Compare your own GPU performance against our baseline.
Honest caveat: At small N, we carry a constant overhead for p-adic quantization. The asymmetry kicks in as scale grows.
Evidence and Audit Repo in git