AlexDev

Join the Gang Gang and have the latest AI and Tech content.

Home page View on GitHub

Crafter

Posted on 11 October 2023.
python linux reinforcement learning

About

Open world survival game for evaluating a wide range of agent abilities within a single environment. This project attempts solving part of the challenge of Crafter using Reinforcement Learning.

This folder contains the following code:

Usage

Instructions

Follow the installation instructions in the Crafter repository. It’s ideal to use some kind of virtual env, my personal favourite is miniconda, although installation should work with the system’s python as well.

Example using venv

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For running the Random Agent execute:

python train.py --steps 10_000 --eval-interval 2500

You can specify other types of agents using --agent. Implemented random, dqn, ddqn.

This will run the Random Agent for 10_000 steps and evaluate it every 2500 steps for 20 episodes. The results with be written in logdir/random_agent/0, where 0 indicates the run.

For executing multiple runs in parallel you could do:

for i in $(seq 1 4); do python train.py --steps 250_000 --eval-interval 25_000 & done

You can also run other types of agents:

Modifier means that it can be used as a decorator: ext_duel_ddqn

python train.py --steps 10_000 --eval-interval 2500 --agent dqn

Visualization

Finally, you can visualize the stats of the agent across the four runs using:

python analysis/plot_stats.py --logdir logdir/random_agent

You can also visualize the stats of all agents using:

python analysis/plot_comp.py --logdir logdir

For other performance metrics see the plotting scripts in the original Crafter repo.

Results

Playing the game manually I have observed that the main difficulty comes from dying of starvation. The game becomes really difficult during the night when there are a lot of enemies. To be able to survive longer you need to create a zone that is separated from the outside world using stones, but this will make you run out of food. Something that I was not able to discover is how to use plants to get food.

The random agent manages to at best place a table and score around 2 or 3 achievements and it has an average return less than 2. The agent that manages to beat the random agent is the eext_duel_ddqn agent.

The architecture of the best agent.

The input to the model is the grayscale image of the game. It uses the duel architecture described in the duel dqn paper. The model outputs the q values associated with each action. The agent will choose the action with the maximum q value during evaluation.

Example Gameplay. The agent managed to craft a pickaxe before dying to a skeleton archer. F

The average return of all the agents that I tested for 100_000 steps.

The success rate of all agents that I have tested for 100_000 steps.

Tasks

  1. More visualization
    • Episodic Reward for eval
    • Loss plot for training
    • Success rate for each achievement
    • Distribution of actions taken with respect to time (step)
    • Compare the methods (reward, success rates)
    • Maybe try to create a plot like in the duel dqn paper? (saving the model, need to output the last layers and convert them to img)
  2. More algorithms
    • DQN
    • DDQN
    • Dueling DQN
    • Maybe try to penalize noop
    • Explore intrinsic reward for exploring new states
    • Give extra reward for placing table and stuff for first time.
    • Stole an idea from some colleagues and made the random actions be actions that actually do something
    • Test with penalize action that is not in the list of actions that do something (i.e the agent chooses craft iron pick but you have no iron)
    • Test dropout in dnn
    • (idk) Test with penalize same action multiple times in a row (or have like a diminishing return for actions) if the agent just spams space then he is bad and is a skill issue.
  3. More data
    • Find a dataset with prerecorded good gameplay
    • Record some gameplay using python3 -m crafter.run_gui --record logdir/human_agent/0/eval
    • Create a replay buffer that randomly samples from prerecorded dataset
  4. More test runs to generate better plots
    • 3 Runs with Random
    • 3 Runs with DQN
    • 3 Runs with DDQN
    • 3 Runs with Duel DQN/DDQN depend on which will be better I guess
    • 3 Runs with extended replay buffer (from human)
    • 3 Runs with extended epsilon decay replay buffer (from human)
    • 3 Runs with noop is bad environment and all modifiers YEET

Conclusion