My colleagues and I wrote a paper and integrated it into transformers.
It has more of both accuracy and speed than NF4
We have compressed hf models for everyone to try: https://huggingface.co/collections/ISTA-DASLab/higgs-675308e...
It has more of both accuracy and speed than NF4
We have compressed hf models for everyone to try: https://huggingface.co/collections/ISTA-DASLab/higgs-675308e...