torch.nn.parallel

2025-01-17 00:17:43 13 分钟读完 (大约 2017 个字)

Categories:

technology

Tags: Distributed Data Parallel

Data parallelism is a way to process multiple data batches across multiple devices simultaneously to achieve better performance. In PyTorch, the DistributedSampler ensures each device gets a non-overlapping input batch. The model is replicated on all the devices; each replica calculates gradients and simultaneously synchronizes with the others using the ring all-reduce algorithm(梯度在設備間環狀傳遞、求和、更新).