Every model learned by gradient descent is approximately a kernel machine (2020)arxiv.org176 pointsAnon842 years ago