Every Model Learned by Gradient Descent Is Approximately a Kernel Machinearxiv.org406 pointsscottlocklin6 years ago