Lossless LLM compression for efficient GPU inference via dynamic-length floatarxiv.org411 pointsCharlesWa year ago