抄録
IB-002
Involving CPUs into Multi-GPU Deep Learning
Tung Le D.(日本IBM)・関山太朗(NII)・根岸 康・今井晴基・河内谷清久仁(日本IBM)
Data parallelism is widely used for training deep neural networks on multiple GPUs in a single machine thanks to its simplicity. However, its scalability is bound by the number of data transfers, mainly for exchanging and accumulating gradients among the GPUs. In this paper, we present a novel approach to data parallel training called CPU-GPU data parallel (CGDP) training that utilizes free CPU time on the host to speed up the training in the GPUs. We also present a cost model for analyzing and comparing the performances of both the typical data parallel training and the CPU-GPU data parallel training.