model parallel experiment.

Jun 12, 2024

I am currently working on distributed training ML model.
Two types: data vs model parallel

tools: pytorch, ray, deepspeed.

Just played around pytorch and I got two different results:

Pipelining model parallel is the optimized model parallel pipeline, and it is expected to be faster than single GPU. the left picture showed different result, I suspect it is a weird GPU configuration. The right picture is more aligned to the correct one, but still worse than single GPU.

model parallel experiment.

Written by George S