In this paper, we will propose a method generating sub-goal for reinforcement learning for POMDP. POMDP is an environment where an agent gets confused by several states even when same information is observed from the environment.
To resolve this problem we will propose a genetic algorithm that dynamically generates subgoal for reinforcement learning. Subgoals are constructed as conditional formulas.The number of subgoals are not tuned for our method, and each of the agents has different solutions since they behave independently. We confirmed the effectiveness of our method by some experiments with partially observable mazes.

footer 情報処理学会 セキュリティ プライバシーポリシー 倫理綱領 著作権について