Many modern neural networks are trained in an over-parameterized regime where the parameters of the model exceed the size of the training dataset. Due to their over-parameterized nature these models in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, models trained via first-order methods (often with early stopping) continue to predict well on yet unseen test data.
- 1 of 61
- next ›