Are big model and big data equally important-part1?

George S
3 min readMay 12, 2023
Photo by Diego PH on Unsplash

In general, when training models, more paras and more data brings higher accuracy. there is a phenomenon called emergent abilities of large language models. An ability is emergent if it is not present in smaller models but in larger models. link

Example of emergent ability is shown above. This table shows that accuracy skyrockts if model scale(paras) goes to very very large. This is the moment that model starts to learn efficiently.

Emergent abilities not just in one specific cases. we could see several tasks that model does not learn well but when scaling up the paras model starts to learn exponentially. methods like Chain of thought, Instruction tunning, Scratchpad, Calibration that make emergent abilities happen. Interesting to note, calibration is the way to tell that if model knows what it says, the research shows that large model knows what it says(confidence) while smaller model sometimes just generates a random word token(lower confidence).

we know that in general model size correlates to performance, is it always true?

inverse-price holds a competition to find out if there are any tasks that model size not correlate to performance, below we listed some examples revealed this phonomonon.

However, this paper reveals what “U-shaped scalling”. Performance decreases up to a certain size of model but increases again if model size is larger than that. so in short, still big model correlate to performance. but this also reveals more research questions?

when training LLM models, if performance is not good due to model size? how big we are talking about and how to improve it? is it always true to try large scale model?

Ref: https://www.youtube.com/watch?v=SaZTJJNOCOY&list=PLJV_el3uVTsOePyfmkfivYZ7Rqr2nMk3W&index=11&t=3s

--

--

George S

senior ML researcher, sharing knowledge and news in AI