情報処理学会 第83回全国大会 会期:2021年3月18日~20日 会場:オンライン開催 情報処理学会 第83回全国大会 会期:2021年3月18日~20日 会場:オンライン開催

6P-03
Update Reward Function based on Accumulated Data
○中澤耕平(University of Southampton),中里研一(BOSCH)
Efficient reward functions can shorten the training time for Reinforcement learning, but it could restrict exploration of solution spaces. Here, the sub-reward function is considered using the Curling problem, which aims to stop a stone launched at a constant velocity by exerting various opposing forces. In our procedure, accumulated data, position and velocity, is classified into groups along the final reward, and sub-reward is updated based on the data groups. Consequently, the optimised reward function successfully executes quick commands without the programmer intentionally limiting the agent from exploring the solution space. Finally, we discuss practical situations for our method.