1.finding single number evaluation metric so to know if your idea works, the purpose is so you can quickly iterate experiements.
2.recall, precission, F1 score(average of recall and precission)
3.Optimizing metric(the one I care most) and Satisficing metric(2nd priority, just good is fine).
4.Choose val(dev) set and test test from same distribution
5.Understanding Human level accuracy(bias/variance analysis)
this is based on ML accuracy is lower than human’s
if if training accuracy way too low from human’s level accuracy: focusing on bias
if if training accuracy is similar from human’s level accuracy: focusing on variance.
6.Error analysis is important, manual step.