Authors: Evan Zheran Liu*, Kelvin Guu*, Panupong (Ice) Pasupat*, Tianlin Shi, Percy Liang (* equal contribution)
Source code accompanying our ICLR 2018 paper:
Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration
The goal of this project is to train machine learning models (agents) to do things in a browser that can be specified in natural language, e.g. "Book a flight from San Francisco to New York for Dec 23rd."
- We tested the system with Python 3.9 (e.g., 3.9.3 on Mac and 3.9.13 on Windows)
-
Python
- Install virtualenv
pip3 install virtualenv
- Create a virtual environment named p5
cd path/to/wge python3 -m venv p5
- Activate the virtualenv for MAC
source p5/bin/activate
- Activate the virtualenv for Windows using command line. If you are using Windows Power Shell (PS), see the next instruction.
.\p5\Scripts\activate.bat
- Activate the virtualenv for Windows using Powershell (PS)
.\p5\Scripts\activate.ps1
-
Python dependencies
pip install -r requirements.txt
- If this gives you problems, try again and add pip's
--ignore-installed
flag.
- If this gives you problems, try again and add pip's
-
(MAC) Go to
miniwob-sandbox
folder and run the recording script from the terminal or command line .cd miniwob-sandbox python record.py
-
(Windows) The
miniwob-sandbox
folder is a link, which may not work in Window. If that is the case, use the full paththird-party\miniwob-sandbox
, and run the recording script from the command line.cd third-party\miniwob-sandbox python record.py
-
If everything works out, you should see the message:
Listening on http://localhost:8032/
.
- Open your browser and enter
http://localhost:8032/
in the address bar. You'll see an "Error: 404 Not Found" message, indicating the server is working correctly. - Open the task environment:
- Press Cmd+O (Mac) or Ctrl+O (Windows)
- Navigate to
miniwob-sandbox/html/miniwob
- Select an environment file (choose from):
click-checkboxes-soft.html
email-inbox-forward-nl.html
social-media.html
- To begin recording, append
?record=true
to the URL in your address bar. For example:file:///path/to/wge/miniwob-sandbox/html/miniwob/social-media.html?record=true
- Record 10 demonstrations for each environment (30 total recordings).
- Ensure the recording server is still running.
- Open the viewer:
- Press Cmd+O (Mac) or Ctrl+O (Windows)
- Navigate to
miniwob-sandbox/viewer
- Select
viewer.html
- The address should look like:
file:///path/to/wge/miniwob-sandbox/viewer/viewer.html
- Your recordings will appear in the left panel.
- Download
glove
fromhttps://nlp.stanford.edu/data/glove.6B.zip
and place it in thewge/data
directory after extraction
-
Run miniwob server: go to
path/to/wge/miniwob-sandbox/html/
and run the suppliedhttp-serve
.- For Mac, use
cd path/to/wge/miniwob-sandbox/html/ ./http-serve
- For Windows, use
cd path\to\wge\miniwob-sandbox\html\ .\http-serve.bat
-
The server should be running at
http://localhost:8080/
-
Next, set the environment variables:
- For Mac, each time you open a new terminal to run an experiment, set these environment variables:
export REPO_DIR=/path/to/wge/ export RL_DATA=/path/to/wge/data/ export RL_DEMO_DIR=/path/to/wge/path/to/miniwob-sandbox/out/ export MINIWOB_BASE_URL='http://localhost:8080/'
- For Windows, each time you open a new command line to run an experiment, set these environment variables:
$env:REPO_DIR="\path\to\wge\" $env:RL_DATA="\path\to\wge\data\" $env:RL_DEMO_DIR="\path\to\wge\path\to\miniwob-sandbox\out\" $env:MINIWOB_BASE_URL="http:\\localhost:8080\"
-
Once you've followed the above steps, test
MiniWoBEnvironment
by runningcd /path/to/wge/ pytest wge/tests/miniwob/test_environment.py -s
- To train a model on a task, say 'email-inbox-forward-nl', run:
python main.py configs/default-base.txt --task email-inbox-forward-nl
- Change the task name (the last parameter) to train for other tasks.
- All training runs are managed by the
MiniWoBTrainingRuns
object. - The most important methods on
MiniWobTrainingRun
are:__init__
: the policy, the environment, demonstrations, etc, are all loaded here.train
: actual training of the policy happens here
During training, there are several key systems involved:
- the environment
- policies
- the model policy
- the exploration policy
- episode generators
- basic episode generator
- best first episode generator
- the replay buffer
All environments implement the Environment
interface. A policy interacts
with the environment by calling the environment's step
method and passing in
actions.
Note that an environment object is batched. It actually represents a batch of environments, each running in parallel (so that we can train faster).
See the Policy
interface. The most important methods are act
,
update_from_episodes
and update_from_replay_buffer
.
Note that all of these methods are also batched (i.e. they operate on multiple episodes in parallel)
The model policy is the main one that we are trying to train. See
MiniWoBPolicy
as an example.
See the EpisodeGenerator
interface. An EpisodeGenerator
runs a
Policy
on an Environment
to produce an Episode
.
See the ReplayBuffer
interface. A ReplayBuffer
stores episodes produced
by the exploration policy. The final model policy is trained off episodes
sampled from the replay buffer.
All configs are in the configs
folder. They are specified in HOCON format.
The arguments to main.py
should be a list of paths to config files.
main.py
then merges these config files according to the
rules explained here.