working on a data-efficient synthetic data generation method. got a 1.1B model to beat GPT-4o mini on PubMedQA. method beats PubMed train split and WizardLM evol-instruct.
email sonyajin@stanford.edu for a test run for your research/personal projects