Skip to content

Commit 0d66184

Browse files
authored
CLI documentation (#743)
Add documentation for the command line interface.
1 parent b452384 commit 0d66184

File tree

13 files changed

+211
-11
lines changed

13 files changed

+211
-11
lines changed

docs/getting-started/cli.rst

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
======================
2+
Command Line Interface
3+
======================
4+
5+
Many features of the core library are accessible via the command line interface built
6+
using the `Sacred <https://github.com/idsia/sacred>`_ package.
7+
8+
Sacred is used to configure and run the algorithms.
9+
It is centered around the concept of `experiments <https://sacred.readthedocs.io/en/stable/experiment.html>`_
10+
which are composed of reusable `ingredients <https://sacred.readthedocs.io/en/stable/ingredients.html>`_.
11+
Each experiment and each ingredient has its own configuration namespace.
12+
Named configurations are used to specify a coherent set of configuration values.
13+
It is recommended to at least read the
14+
`Sacred documentation about the command line interface <https://sacred.readthedocs.io/en/stable/command_line.html>`_.
15+
16+
The :py:mod:`scripts <imitation.scripts>` package contains a number of sacred experiments to either execute algorithms or perform utility tasks.
17+
The most important :py:mod:`ingredients <imitation.scripts.ingredients>` for imitation learning are:
18+
19+
- :py:mod:`Environments <imitation.scripts.ingredients.environment>`
20+
- :py:mod:`Expert Policies <imitation.scripts.ingredients.expert>`
21+
- :py:mod:`Expert Demonstrations <imitation.scripts.ingredients.demonstrations>`
22+
- :py:mod:`Reward Functions <imitation.scripts.ingredients.reward>`
23+
24+
25+
Usage Examples
26+
==============
27+
28+
Here we demonstrate some usage examples for the command line interface.
29+
You can always find out all the configurable values by running:
30+
31+
.. code-block:: bash
32+
33+
python -m imitation.scripts.<script> print_config
34+
35+
Run BC on the ``CartPole-v1`` environment with a pre-trained PPO policy as expert:
36+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
37+
38+
.. note:: Here the cartpole environment is specified via a named configuration.
39+
40+
.. code-block:: bash
41+
42+
python -m imitation.scripts.train_imitation bc with \
43+
cartpole \
44+
demonstrations.n_expert_demos=50 \
45+
bc.train_kwargs.n_batches=2000 \
46+
expert.policy_type=ppo \
47+
expert.loader_kwargs.path=tests/testdata/expert_models/cartpole_0/policies/final/model.zip
48+
49+
50 expert demonstrations are sampled from the PPO policy that is included in the testdata folder.
50+
2000 batches are enough to train a good policy.
51+
52+
Run DAgger on the ``CartPole-v0`` environment with a random policy as expert:
53+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
54+
55+
.. code-block:: bash
56+
57+
python -m imitation.scripts.train_imitation dagger with \
58+
cartpole \
59+
dagger.total_timesteps=2000 \
60+
demonstrations.n_expert_demos=10 \
61+
expert.policy_type=random
62+
63+
This will not produce any meaningful results, since a random policy is not a good expert.
64+
65+
66+
Run AIRL on the ``MountainCar-v0`` environment with a expert from the HuggingFace model hub:
67+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
68+
69+
.. code-block:: bash
70+
71+
python -m imitation.scripts.train_adversarial airl with \
72+
seals_mountain_car \
73+
total_timesteps=5000 \
74+
expert.policy_type=ppo-huggingface \
75+
demonstrations.n_expert_demos=500
76+
77+
.. note:: The small number of total timesteps is only for demonstration purposes and will not produce a good policy.
78+
79+
80+
Run GAIL on the ``seals/Swimmer-v0`` environment
81+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
82+
83+
Here we do not use the named configuration for the seals environment, but instead specify the gym_id directly.
84+
The ``seals:`` prefix ensures that the seals package is imported and the environment is registered.
85+
86+
.. note:: The Swimmer environment needs `mujoco_py` to be installed.
87+
88+
.. code-block:: bash
89+
90+
python -m imitation.scripts.train_adversarial gail with \
91+
environment.gym_id="seals:seals/Swimmer-v0" \
92+
total_timesteps=5000 \
93+
demonstrations.n_expert_demos=50
94+
95+
96+
Algorithm Scripts
97+
=================
98+
99+
Call the algorithm scripts like this:
100+
101+
.. code-block:: bash
102+
103+
python -m imitation.scripts.<script> [command] with <named_config> <config_values>
104+
105+
+---------------------------------+------------------------------+----------+
106+
| algorithm | script | command |
107+
+=================================+==============================+==========+
108+
| BC | train_imitation | bc |
109+
+---------------------------------+------------------------------+----------+
110+
| DAgger | train_imitation | dagger |
111+
+---------------------------------+------------------------------+----------+
112+
| AIRL | train_adversarial | airl |
113+
+---------------------------------+------------------------------+----------+
114+
| GAIL | train_adversarial | gail |
115+
+---------------------------------+------------------------------+----------+
116+
| Preference Comparison | train_preference_comparisons | - |
117+
+---------------------------------+------------------------------+----------+
118+
| MCE IRL | none | - |
119+
+---------------------------------+------------------------------+----------+
120+
| Density Based Reward Estimation | none | - |
121+
+---------------------------------+------------------------------+----------+
122+
123+
Utility Scripts
124+
===============
125+
126+
Call the utility scripts like this:
127+
128+
.. code-block:: bash
129+
130+
python -m imitation.scripts.<script>
131+
132+
+-----------------------------------------+-----------------------------------------------------------+
133+
| Functionality | Script |
134+
+=========================================+===========================================================+
135+
| Reinforcement Learning | :py:mod:`train_rl <imitation.scripts.train_rl>` |
136+
+-----------------------------------------+-----------------------------------------------------------+
137+
| Evaluating a Policy | :py:mod:`eval_policy <imitation.scripts.eval_policy>` |
138+
+-----------------------------------------+-----------------------------------------------------------+
139+
| Parallel Execution of Algorithm Scripts | :py:mod:`parallel <imitation.scripts.parallel>` |
140+
+-----------------------------------------+-----------------------------------------------------------+
141+
| Converting Trajectory Formats | :py:mod:`convert_trajs <imitation.scripts.convert_trajs>` |
142+
+-----------------------------------------+-----------------------------------------------------------+
143+
| Analyzing Experimental Results | :py:mod:`analyze <imitation.scripts.analyze>` |
144+
+-----------------------------------------+-----------------------------------------------------------+
145+
146+
147+
Output Directories
148+
==================
149+
150+
The results of the script runs are stored in the following directory structure:
151+
152+
.. code-block::
153+
154+
output
155+
├── <algo>
156+
│ └── <environment>
157+
│ └── <timestamp>
158+
│ ├── log
159+
│ ├── monitor
160+
│ └── sacred -> ../../../sacred/<script_name>/1
161+
└── sacred
162+
└── <script_name>
163+
├── 1
164+
└── _sources
165+
166+
It contains the final model, tensorboard logs, sacred logs and the sacred source files.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ If you use ``imitation`` in your research project, please cite our paper to help
4747
getting-started/what-is-imitation
4848
getting-started/variable-horizon
4949
getting-started/first-steps
50+
getting-started/cli
5051

5152
.. toctree::
5253
:maxdepth: 2
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
"""Ingredients for scripts."""
1+
"""Ingredients for Sacred experiments."""

src/imitation/scripts/ingredients/bc.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
"""Ingredients for training a BC policy."""
1+
"""This ingredient provides BC algorithm instance.
2+
3+
It is either loaded from disk or constructed from scratch.
4+
"""
25
import warnings
36
from typing import Optional, Sequence
47

src/imitation/scripts/ingredients/demonstrations.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
"""Ingredient for scripts learning from demonstrations."""
1+
"""This ingredient provides (expert) demonstrations to learn from.
2+
3+
The demonstrations are either loaded from disk, from the HuggingFace Dataset Hub, or
4+
sampled from the expert policy provided by the expert ingredient.
5+
"""
26

37
import logging
48
from typing import Any, Dict, Optional, Sequence

src/imitation/scripts/ingredients/environment.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Environment Ingredient for sacred experiments."""
1+
"""This ingredient provides a vectorized gym environment."""
22
import contextlib
33
from typing import Any, Generator, Mapping
44

src/imitation/scripts/ingredients/expert.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,19 @@
1-
"""Common configuration elements for loading of expert policies."""
1+
"""This ingredient provides an expert policy.
2+
3+
The expert policy is either loaded from disk or from the HuggingFace Model Hub or is
4+
a test policy (e.g., random or zero).
5+
The supported policy types are:
6+
7+
- :code:`ppo` and :code:`sac`: A policy trained with SB3.
8+
Needs a `path` in the `loader_kwargs`.
9+
- :code:`<algo>-huggingface` (algo can be `ppo` or `sac`):
10+
A policy trained with SB3 and uploaded to the HuggingFace Model Hub.
11+
Will load the model from the repo :code:`<organization>/<algo>-<env_name>`.
12+
You can set the organization with the `organization` key in :code:`loader_kwargs`.
13+
The default is `HumanCompatibleAI`.
14+
- :code:`random`: A policy that takes random actions.
15+
- :code:`zero`: A policy that takes zero actions.
16+
"""
217
import sacred
318

419
from imitation.policies import serialize

src/imitation/scripts/ingredients/logging.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
"""Logging ingredient for scripts."""
1+
"""This ingredient provides a number of logging utilities.
2+
3+
It is responsible for logging to WandB, TensorBoard, and stdout.
4+
It will also create a symlink to the sacred logging directory in the log directory.
5+
"""
26

37
import logging
48
import pathlib

src/imitation/scripts/ingredients/policy.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Ingredient implementation for a SB3 policy."""
1+
"""This ingredient provides a newly constructed stable-baselines3 policy."""
22

33
import logging
44
from typing import Any, Mapping, Type

src/imitation/scripts/ingredients/policy_evaluation.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
"""Sacred ingredient for evaluating a policy on a VecEnv."""
1+
"""This ingredient performs evaluation of learned policy.
2+
3+
It takes care of the right wrappers, does some rollouts
4+
and computes statistics of the rollouts.
5+
"""
26

37
from typing import Mapping, Union
48

0 commit comments

Comments
 (0)