Post-transformer inference: 224× compression of Llama-70B with improved accuracyzenodo.org72 pointsanima-core6 months ago