Skip to content

Easily access datasets on Rucio data lake #156

@matbun

Description

@matbun

Add a Python function capable of translating a namespaced Rucio dataset/file to the absolute path on the local filesystem of the datacenter (e.g., HPC) on which the code is currently running.

Sth like namespace_to_path('jdoe:physics_dataset') returning:

  • '/dacache/slling.si/.../physics_dataset' when on HPC1
  • '/other/path/.../physics_dataset' when on HPC2

The dataset can or cannot be on the HPC:

  • When the dataset is available on the local RSE, return a list of paths to the dataset files (or just the path to the root directory of the dataset, if you prefer).
  • When the dataset is not there, create a Rucio rule for async copy of the dataset and raise a custom exception to inform the user that the dataset is not present at the moment and that the job cannot continue, although a rule has been created.

How to proceed;

  • Create a rucio.py module under src/itwinai/ to store the python function meant to convert a rucio dataset to the absolute path on the local RSE
  • Add tests in a test_rucio.py file under tests/

Once this is done, we will integrate it with other itwinai modules (e.g., config parser and CLI)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions