It was ahead of its time, I guess, in the sense that lots of folks weren't working between hg38 and hg19 on the regular like we are now.

]]>https://martin.softf1.com/g/yellow_soap_opera_blog/sorting-accelerator-hardware-idea-01

(archival copy: https://archive.is/62mcl )

]]>And if your model **is** complex enough to be capable of overfitting a 100M dataset, then you are in the same situation as with a 1000-element dataset: do you trust 1% of your data to provide the optimal stopping point, or let 99% have a say in the process? Or, in general, if you want to bias the training process from overfitting, why don't you use "usual" regularization at all.

I can't make a meaningful experiment with 100M points at the moment (and doing the "mean estimation" with more data won't change the results there). However, I'll add some perspective on the effect of other regularization methods, following Tanel's remark.

I think I see another way to turn your comment into an interesting observation. Why don't we try to include confidence intervals into the early stopping criteria. That is, why not stop not when the difference between validation and training error becomes statistically significant. I think I never saw that being done in practice. Will try and then extend the post if it makes for some interesting results.

]]>Potato-potatoe.

Well, not really. As you use the validation error to stop training you may not claim it provides an unbiased estimate any more.

Hence, early stopping is a mechanism of reducing training error

"Generalization error" you meant. True, it is a "mechanism". The question is whether this is a reasonable mechanism in comparison with any other regularization techniques.

]]>Show the graphs when the training set is fixed and you choose the

validation set by fetching new elements.

No, I deliberately did not do it. Because in practice there is never a situation where you have a fixed training set and need to choose the size of the validation set. In practice you have a fixed "full" training set and need to decide whether to use a separate validation set, and if so, what must be its size. That is, you must trade off training data for validation.

In many settings sampling a reasonable 100000 element validation set is for free. For example your training set is of order 100 000 000. In such setting removing validation set does not decrease ability to learn. So please stop apologising and show the results for this setting as well.

]]>Early stopping is a technique to reduce the bias in training error estimate.

Hold-out validation is such a technique. Early stopping is the |process of stopping training based on that separate estimate.

Potato-potatoe. You stop early. Why? Because further minimisation of training error is pointless! Why? Because further iterations of training error estimates would be more biased. That is, early stopping makes sense only if it allows you to choose a model with lower loss. This is possible only if the bias (optimism) in training error in next iterations will be bigger than the decrease of training error.

Hence, early stopping is a mechanism of reducing training error. Early stopping is successful only if the optimism for the final training error is smaller than the optimism for the final training error without early abort.

]]>Show the graphs when the training set is fixed and you choose the validation set by fetching new elements.

No, I deliberately did not do it. Because in practice there is never a situation where you have a fixed training set and need to choose the size of the validation set. In practice you have a fixed "full" training set and need to decide whether to use a separate validation set, and if so, what must be its size. That is, you must trade off training data for validation.

Early stopping is a technique to reduce the bias in training error estimate.

*Hold-out validation* is such a technique. Early stopping is the process of *stopping training* based on that separate estimate.

By the explanation given above it is clear that the early stopping is useful when training error reaches plateau.

Well, again, in practice, it rarely is the case. By the time your training error reaches plateau, the validation error is long gone up.

]]>Early stopping is a technique to reduce the bias in training error estimate. Recall that training error estimate does not approximate true loss due to bias from multiple hypothesis testing. In training you effectively test thousands if not millions of distinct hypothesis. As you choose ( or at least try to choose) the hypothesis (model) with lowest observable error estimate you bias the estimate towards zero. If there are no clear winners in the set of possible models the bias is very significant essentially the number of hypotheses tested in parallel is very large.

In the early stopping setting you estimate loss for a single function in each iteration as by fitting you have chosen single function. Thus, for a single iteration you have reduced multiple testing problem to a singe hypothesis testing and your estimate on the overall loss is unbiased.

As you do early stopping steps for number of iterations the multiple testing setting creeps in again. By the algorithm design you choose the model with the lowest error on the validation set. As a result, the error estimate on the validation set is again downward biased.

The early stopping strategy makes sense only if the bias on the training set is much larger than for the validation set. When there are many roughly equal models in the model class say millions and you do hundreds of iteration with early stopping then the training error bias is much larger than validation set bias and early stopping helps you to select better models.

To be formally correct and quantify the effects I should introduce a right complexity measure for model classes with infinite number of members but on intuitive level it should be clear what is the effect.

By the explanation given above it is clear that the early stopping is useful when training error reaches plateau. Then by construction you are in the setting where you have gazillion almost equivalent models and small drop in training error can be explained by a pure luck.

]]>Is there an easy way to add the percentage value to the numbers inside the venn diagram?

I tried figure it out myself but I cant seem to get it working.