Imagine of precision compensation method under asynchoronized updating

Allowing async and sync updating of cpu parameters

For offload parameters or cpu parameters, we can allow them to be updated asynchronously and synchronously. Partial parameter of them to update asynchronously, and the other part to update synchronously.

low bit gradient norm calculating to decide which parameters to update asynchronously

For each parameter, we can calculate the gradient norm of the low bit gradient. If the gradient norm is large, we can update it synchronously. If the gradient norm is small, we can update it asynchronously.
we only neet to compute the magnitude relationships of the gradient norms, so we can compute them in low bit precision to speedup the computation progresss.

gradually increase async updating ratio as traing iterations increase.

The loss curve usually decreases dramatically at the beginning of training, and then decreases slowly which implies that the parameters would change substantially at beginning of training, all parameters are impotance. After beggining peroid of traning, parameters change slowlly, some parameters are not so important.
So we can gradually increase the asynchronous updating ratio as training iterations increase. Enable unimportant parameters asynchoronously update.

Imagine of precision compensation method under asynchoronized updating

YOVR-青山

Allowing async and sync updating of cpu parameters

low bit gradient norm calculating to decide which parameters to update asynchronously

gradually increase async updating ratio as traing iterations increase.

其他文章

torch.distributed-Learning