Pipelining model parallel is the optimized model parallel pipeline, and it is expected to be faster than single GPU. the left picture showed different result, I suspect it is a weird GPU configuration. The right picture is more aligned to the correct one, but still worse than single GPU.