model parallel experiment.

George S
Jun 12, 2024

--

I am currently working on distributed training ML model.
Two types: data vs model parallel

tools: pytorch, ray, deepspeed.

Just played around pytorch and I got two different results:

Pipelining model parallel is the optimized model parallel pipeline, and it is expected to be faster than single GPU. the left picture showed different result, I suspect it is a weird GPU configuration. The right picture is more aligned to the correct one, but still worse than single GPU.

--

--

George S

senior ML researcher, sharing knowledge and news in AI