Skip to content

Commit 151020e

Browse files
Adresses SMART-Lab#160
1 parent e8a7d0a commit 151020e

File tree

3 files changed

+7
-14
lines changed

3 files changed

+7
-14
lines changed

docs/source/autoresume.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,5 @@ tasks as soon as they hit the walltime. The caveat here is that your tasks
1010
**must be resumable**, i.e. be capable of restoring their state after being
1111
killed and rerun.
1212

13-
You can engage the autoresumption by passing ``-m`` or ``--autoresume`` during
13+
You can engage the autoresumption by passing ``-r`` or ``--autoresume`` during
1414
``smart-dispatch`` execution. See :doc:`usage` for details.

docs/source/usage.rst

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,11 @@ Hierarchy of generated files
1111

1212
In order to understand the contents of the generated folders/files, it's good to know how ``smart-dispatch`` deals with **commands** that a user requests to launch on the cluster:
1313

14-
* Each invokation of ``smart-dispatch`` creates a so-called **batch** of **jobs**. Smart Dispatch will do its best to create as many simultaneous jobs so as to effecitvely utilze the allocated resources.
14+
* Smart Dispatch will distribute commands to jobs such that each of the latter uses an entire node. Jobs may run many commands concurrently if necessary to use a maximum number of cores and GPUs. The distribution is based on number of cores per node / per command and number of GPUs per node / per command.
15+
1516
* Each job is basically a single PBS file that is run by the queue management system on the cluster (either ``msub`` or ``qsub``).
16-
* A job spawns mulitple concurrent **workers** that all cooperate to execute the requested commands.
17-
* Each worker (basically, a python script) is executing commands sequentially.
17+
* A job spawns multiple concurrent **workers** that all cooperate to execute the requested commands.
18+
* Each worker is executing commands sequentially.
1819

1920
A typical hierarchy of ``./SMART_DISPATCH_LOGS/{batch_id}/`` should look like this: ::
2021

@@ -58,7 +59,7 @@ Now let's go through the subdirectories.
5859
This directory holds generated PBS files (``job_commands_{pbs_index}.sh``) as well as three command lists:
5960

6061
``commands.txt``:
61-
A list pending commands (this is where the workers are taking their next commands to execute from).
62+
A list of pending commands (this is where the workers are taking their next commands to execute from).
6263
``running_commands.txt``:
6364
A list of currently running commands.
6465
``failed_commands.txt``:
@@ -68,7 +69,7 @@ This directory holds generated PBS files (``job_commands_{pbs_index}.sh``) as we
6869
``logs/``
6970
^^^^^^^^^
7071

71-
Output and error logs in are saved in this directory. The root level contains logs for actual commands. There are also two additional subfolder:
72+
Output and error logs are saved in this directory. The root level contains logs for actual commands. There are also two additional subfolders:
7273

7374
``job/``:
7475
Holds logs for the PBS files.

smartdispatch.sublime-project

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)