Diverse LLM subsets via k-means (100K-1M) [Pretraining, IF, Reasoning]github.com/AmanPriyanshu2 pointsradii-llm9 months ago