User description

Deep Q-Learning has been successfully applied to a wide variety of tasks in the past several years. However, the architecture of the vanilla Deep Q-Network is not suited to deal with partially observable environments such as 3D video games. For this, recurrent layers had been added to the Deep Q-Network in order to allow it to handle past dependencies. We here use Minecraft for its customization advantages and design two very simple missions that can be frames as Partially Observable Markov Decision Process. We compare on these missions the Deep Q-Network and the Deep Recurrent Q-Network in order to see if the latter, which is trickier and longer to train, is always the best architecture when the agent has to deal with partial observability.Deep Reinforcement Learning has been highly active since the successfull work of Mnih et al. (2013) on Atari 2600 games. From that moment, a lot of methods have been used on a wide range of environments in order to make an agent reach an objective (Justesen et al., 2017). These environments can be framed as Markov Decision Processes (MDPs) defined by the tuple fragmentsS,A,P,R< italic_S , italic_A , italic_P , italic_R >where at each timestep t