This plug-in is part of itwinai. Its purpose is to provide integration with AI pipelines developed at HTW Berlin for the purposes of pulsar analysis and detection (radio-astronomy). Please visit the original repository for more technical details on the application.
Integration Author: Alex Krochak, FZJ
Plug-in description
itwinai implements a plugin architecture, allowing the community to independently develop sub-packages for itwinai. These sub-packages can later be installed from PyPI and imported into Python code as if they were part ofitwinai. This is made possible by namespace packages.
Itwinai plugins are developed by assuming that the plugin logic,
once installed alongside itwinai, will be accessible under
the itwinai.plugins.* namespace. Example:
from itwinai.plugins.pulsar import data, trainer
# Instantiate a class implemented by the plugin
my_dataset = data.PulsarDataset(...)
my_trainer = trainer.PulsarTrainer(...)Plug-in installation
A straightforward way to install the plug-in on local machine is as follows:
- Clone this repository on your local machine or server.
- Create a new Python virtual environment:
python -m venv .venv. - Activate this virtual environment
source .venv/bin/activate. - (Recommmended) Install
uvfor accelerated package management:pip install uv. More information can be found here. - Then run from the top directory:
(uv) pip install .. This will install the plug-in. NOTE: itwinai itself is also installed automatically, as it is a plug-in dependency (seepyproject.toml) - Done ! Now you can either run itwinai from CLI, i.e.:
itwinai exec-pipeline +pipe_key=syndata_pipeline. Alternatively, you can unpack theexec.zipoutside the plug-in directory and runexec.py. Make sure you are using the virtual environment installed at the plug-in, but operate outside the plug-in directory !.
Plug-in installation on Juwels-Booster
When installing on Juwels-Booster at FZJ, additional steps need to be taken.
- Load the necessary modules for Python 3.11.3 (recommended):
module --force purgeml Stages/2024 GCCcore/.12.3.0 Python/3.11.3Verify the Python version bypython --version. - Clone this repository in your personal project folder.
- Create a new Python virtual environment:
python -m venv .venv. - Activate this virtual environment
source .venv/bin/activate. Verify correct Python and Pip path with:which pipandwhich python. - (Recommmended) Install
uvfor accelerated package management:pip install uv --no-cache-dir. The argument--no-cache-diris necessary whenever installing with pip to prevent the~/.cachefolder to fill up your home quota. More information on UV can be found here. - Then run from the top directory:
(uv) pip install . --no-cache-dir. This will install the plug-in. NOTE: itwinai itself is also installed automatically, as it is a plug-in dependency (seepyproject.toml) - Extract
exec.tar.gzwithtar -xvzf exec.tar.gzoutside the plug-in folder. Navigate to the exec folder and test the plug-in execution:itwinai exec-pipeline +pipe_key=syndata_pipeline.
When running from exec folder, inspect the config.yaml and batch-jsc.sh files
and provide necessary updates. Set the correct path to your virtual environment .venv
within the batch-jsc.sh script and also make sure to choose the proper distributed strategy
in config.yaml (ddp is recommended).
Running from a configuration file
You can run the full pipeline sequence by executing the following commands locally. Please note that it is recommended to run these commands outside the plug-in repository for organizational reasons. You just need to make sure that the correct Python virtual environment is activated.
itwinai will read these commands from the config.yaml file.
- Generate the synthetic data -
itwinai exec-pipeline +pipe_key=syndata_pipeline - Initialize and train a UNet model -
itwinai exec-pipeline +pipe_key=unet_pipeline - Initialize and train a FilterCNN model -
itwinai exec-pipeline +pipe_key=fcnn_pipeline - Initialize and train a CNN1D model -
itwinai exec-pipeline +pipe_key=cnn1d_pipeline - Compile a full pipeline and test it -
itwinai exec-pipeline +pipe_key=evaluate_pipeline
When running on HPC, you can use the batch-jsc.sh SLURM script to run these commands.
Please make sure you select an appropriate strategy in config.yaml when running on HPC.
The recommended strategy is ddp.
Logging with MLflow
By default, the config.yaml ensures that the MLflow logging is enabled during the training.
During or after the run, you can launch an MLflow server by executing
mlflow server --backend-store-uri mllogs/mlflow and connecting to http://127.0.0.1:5000/
in your browser.
Test suite
The test suite is located in the tests/ folder.
Before running the test suite, you should make sure that the pytorch fixture in:
tests/test_pulsar.py:torch_env()
is correctly defined and corresponds to the virtual environment where itwinai is
installed on your system.
It contains integration tests for each of the pipelines 1-5 mentioned above. The configuration
and execution of the test suite is defined in: tests/test_pulsar.py and
in the configuration file: tests/.config-test.yaml.
If you are updating the test suite, make sure you update both of these files.