Allowing async and sync updating of cpu parameters
For offload parameters or cpu parameters, we can allow them to be updated asynchronously and synchronously. Partial parameter of them to update asynchronously, and the other part to update synchronously.
low bit gradient norm calculating to decide which parameters to update asynchronously
For each parameter, we can calculate the gradient norm of the low bit gradient. If the gradient norm is large, we can update it synchronously. If the gradient norm is small, we can update it asynchronously.
we only neet to compute the magnitude relationships of the gradient norms, so we can compute them in low bit precision to speedup the computation progresss.
gradually increase async updating ratio as traing iterations increase.
The loss curve usually decreases dramatically at the beginning of training, and then decreases slowly which implies that the parameters would change substantially at beginning of training, all parameters are impotance. After beggining peroid of traning, parameters change slowlly, some parameters are not so important.
So we can gradually increase the asynchronous updating ratio as training iterations increase. Enable unimportant parameters asynchoronously update.
- 本文链接: http://blogs.yovr.top/precision-compensation-method-under-asynchoronized-updating/
- 版权声明: 本博客所有文章除特别声明外,均默认采用 CC BY-NC-SA 4.0 许可协议。