Serving 70B-scale LLMs efficiently on low-resource edge devices [pdf]arxiv.org248 pointssimonpure2 years ago