|
| 1 | +====================== |
| 2 | +Command Line Interface |
| 3 | +====================== |
| 4 | + |
| 5 | +Many features of the core library are accessible via the command line interface built |
| 6 | +using the `Sacred <https://github.com/idsia/sacred>`_ package. |
| 7 | + |
| 8 | +Sacred is used to configure and run the algorithms. |
| 9 | +It is centered around the concept of `experiments <https://sacred.readthedocs.io/en/stable/experiment.html>`_ |
| 10 | +which are composed of reusable `ingredients <https://sacred.readthedocs.io/en/stable/ingredients.html>`_. |
| 11 | +Each experiment and each ingredient has its own configuration namespace. |
| 12 | +Named configurations are used to specify a coherent set of configuration values. |
| 13 | +It is recommended to at least read the |
| 14 | +`Sacred documentation about the command line interface <https://sacred.readthedocs.io/en/stable/command_line.html>`_. |
| 15 | + |
| 16 | +The :py:mod:`scripts <imitation.scripts>` package contains a number of sacred experiments to either execute algorithms or perform utility tasks. |
| 17 | +The most important :py:mod:`ingredients <imitation.scripts.ingredients>` for imitation learning are: |
| 18 | + |
| 19 | +- :py:mod:`Environments <imitation.scripts.ingredients.environment>` |
| 20 | +- :py:mod:`Expert Policies <imitation.scripts.ingredients.expert>` |
| 21 | +- :py:mod:`Expert Demonstrations <imitation.scripts.ingredients.demonstrations>` |
| 22 | +- :py:mod:`Reward Functions <imitation.scripts.ingredients.reward>` |
| 23 | + |
| 24 | + |
| 25 | +Usage Examples |
| 26 | +============== |
| 27 | + |
| 28 | +Here we demonstrate some usage examples for the command line interface. |
| 29 | +You can always find out all the configurable values by running: |
| 30 | + |
| 31 | +.. code-block:: bash |
| 32 | +
|
| 33 | + python -m imitation.scripts.<script> print_config |
| 34 | +
|
| 35 | +Run BC on the ``CartPole-v1`` environment with a pre-trained PPO policy as expert: |
| 36 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 37 | + |
| 38 | +.. note:: Here the cartpole environment is specified via a named configuration. |
| 39 | + |
| 40 | +.. code-block:: bash |
| 41 | +
|
| 42 | + python -m imitation.scripts.train_imitation bc with \ |
| 43 | + cartpole \ |
| 44 | + demonstrations.n_expert_demos=50 \ |
| 45 | + bc.train_kwargs.n_batches=2000 \ |
| 46 | + expert.policy_type=ppo \ |
| 47 | + expert.loader_kwargs.path=tests/testdata/expert_models/cartpole_0/policies/final/model.zip |
| 48 | +
|
| 49 | +50 expert demonstrations are sampled from the PPO policy that is included in the testdata folder. |
| 50 | +2000 batches are enough to train a good policy. |
| 51 | + |
| 52 | +Run DAgger on the ``CartPole-v0`` environment with a random policy as expert: |
| 53 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 54 | + |
| 55 | +.. code-block:: bash |
| 56 | +
|
| 57 | + python -m imitation.scripts.train_imitation dagger with \ |
| 58 | + cartpole \ |
| 59 | + dagger.total_timesteps=2000 \ |
| 60 | + demonstrations.n_expert_demos=10 \ |
| 61 | + expert.policy_type=random |
| 62 | +
|
| 63 | +This will not produce any meaningful results, since a random policy is not a good expert. |
| 64 | + |
| 65 | + |
| 66 | +Run AIRL on the ``MountainCar-v0`` environment with a expert from the HuggingFace model hub: |
| 67 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 68 | + |
| 69 | +.. code-block:: bash |
| 70 | +
|
| 71 | + python -m imitation.scripts.train_adversarial airl with \ |
| 72 | + seals_mountain_car \ |
| 73 | + total_timesteps=5000 \ |
| 74 | + expert.policy_type=ppo-huggingface \ |
| 75 | + demonstrations.n_expert_demos=500 |
| 76 | +
|
| 77 | +.. note:: The small number of total timesteps is only for demonstration purposes and will not produce a good policy. |
| 78 | + |
| 79 | + |
| 80 | +Run GAIL on the ``seals/Swimmer-v0`` environment |
| 81 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 82 | + |
| 83 | +Here we do not use the named configuration for the seals environment, but instead specify the gym_id directly. |
| 84 | +The ``seals:`` prefix ensures that the seals package is imported and the environment is registered. |
| 85 | + |
| 86 | +.. note:: The Swimmer environment needs `mujoco_py` to be installed. |
| 87 | + |
| 88 | +.. code-block:: bash |
| 89 | +
|
| 90 | + python -m imitation.scripts.train_adversarial gail with \ |
| 91 | + environment.gym_id="seals:seals/Swimmer-v0" \ |
| 92 | + total_timesteps=5000 \ |
| 93 | + demonstrations.n_expert_demos=50 |
| 94 | +
|
| 95 | +
|
| 96 | +Algorithm Scripts |
| 97 | +================= |
| 98 | + |
| 99 | +Call the algorithm scripts like this: |
| 100 | + |
| 101 | +.. code-block:: bash |
| 102 | +
|
| 103 | + python -m imitation.scripts.<script> [command] with <named_config> <config_values> |
| 104 | +
|
| 105 | ++---------------------------------+------------------------------+----------+ |
| 106 | +| algorithm | script | command | |
| 107 | ++=================================+==============================+==========+ |
| 108 | +| BC | train_imitation | bc | |
| 109 | ++---------------------------------+------------------------------+----------+ |
| 110 | +| DAgger | train_imitation | dagger | |
| 111 | ++---------------------------------+------------------------------+----------+ |
| 112 | +| AIRL | train_adversarial | airl | |
| 113 | ++---------------------------------+------------------------------+----------+ |
| 114 | +| GAIL | train_adversarial | gail | |
| 115 | ++---------------------------------+------------------------------+----------+ |
| 116 | +| Preference Comparison | train_preference_comparisons | - | |
| 117 | ++---------------------------------+------------------------------+----------+ |
| 118 | +| MCE IRL | none | - | |
| 119 | ++---------------------------------+------------------------------+----------+ |
| 120 | +| Density Based Reward Estimation | none | - | |
| 121 | ++---------------------------------+------------------------------+----------+ |
| 122 | + |
| 123 | +Utility Scripts |
| 124 | +=============== |
| 125 | + |
| 126 | +Call the utility scripts like this: |
| 127 | + |
| 128 | +.. code-block:: bash |
| 129 | +
|
| 130 | + python -m imitation.scripts.<script> |
| 131 | +
|
| 132 | ++-----------------------------------------+-----------------------------------------------------------+ |
| 133 | +| Functionality | Script | |
| 134 | ++=========================================+===========================================================+ |
| 135 | +| Reinforcement Learning | :py:mod:`train_rl <imitation.scripts.train_rl>` | |
| 136 | ++-----------------------------------------+-----------------------------------------------------------+ |
| 137 | +| Evaluating a Policy | :py:mod:`eval_policy <imitation.scripts.eval_policy>` | |
| 138 | ++-----------------------------------------+-----------------------------------------------------------+ |
| 139 | +| Parallel Execution of Algorithm Scripts | :py:mod:`parallel <imitation.scripts.parallel>` | |
| 140 | ++-----------------------------------------+-----------------------------------------------------------+ |
| 141 | +| Converting Trajectory Formats | :py:mod:`convert_trajs <imitation.scripts.convert_trajs>` | |
| 142 | ++-----------------------------------------+-----------------------------------------------------------+ |
| 143 | +| Analyzing Experimental Results | :py:mod:`analyze <imitation.scripts.analyze>` | |
| 144 | ++-----------------------------------------+-----------------------------------------------------------+ |
| 145 | + |
| 146 | + |
| 147 | +Output Directories |
| 148 | +================== |
| 149 | + |
| 150 | +The results of the script runs are stored in the following directory structure: |
| 151 | + |
| 152 | +.. code-block:: |
| 153 | +
|
| 154 | + output |
| 155 | + ├── <algo> |
| 156 | + │ └── <environment> |
| 157 | + │ └── <timestamp> |
| 158 | + │ ├── log |
| 159 | + │ ├── monitor |
| 160 | + │ └── sacred -> ../../../sacred/<script_name>/1 |
| 161 | + └── sacred |
| 162 | + └── <script_name> |
| 163 | + ├── 1 |
| 164 | + └── _sources |
| 165 | +
|
| 166 | +It contains the final model, tensorboard logs, sacred logs and the sacred source files. |
0 commit comments