-
Notifications
You must be signed in to change notification settings - Fork 38
Description
In #377 we are introducing the concept of compactable dimensions, and a compact format to allow to efficiently transfer lists where data is constant along one axis.
This is a critical requirement for the vast majority of trajectories we want to support.
E.g. the number of atoms, the list of atoms and elements, and in some cases also the lattice_vectors do not change over a simulation, and it is good to be able to communicate that the value is constant and transfer it only once.
The current specifications provide a way to return such compact format (essentially returning only one element, meaning that the value is constant; see specs).
In the specs, we leave the option open to define other formats in the future.
This issue serves the goal of both keeping a list of other usecases that might benefit of similar (but different) compact formats, and possible implementation ideas of how this could be implemented (in a backward-compatible way).
Note: this is the outcome of the discussion at the 2025 OPTIMADE workshop with several people involved, including @gmrigna @rartino @sauliusg
Use cases
- (already covered) a list where the value is constant for the whole dimension (
constantformat). E.g., a property being constant for the whole trajectory. (Note that this also includes a list or list of list being constant, e.g. the 3x3 matrix of lattice vectors being constant) - a value linearly changing (where one specifies value at index 0 and "slope" between indices i and i+1; common for the timestep in a MD simulation). Actually this could be achieved by providing the first TWO elements, and clarifying that the remaining ones are linearly extrapolated. The advantage is that the first two values do not have a special meaning (e.g. the second being a "gradient") but are actually the first two values.
- a value assuming constant values for ranges of indices: e.g. value1 for indices 0-99, value2 for indices 100-199, ... [in this case, we can probably allow to associate a set of values with a set of non-overlapping slice specifications (start, stop,step), and the rest is assumed to be null]. E.g. to indicate a thermostat temperature in a simulation, set only in some ranges of indices and to different values.
- possibly, a format for very few sparse values? E.g. a property measured or computed for very few frames
Implementation notes
When we will need to define a new format, we need to:
- adapt the specs to define the new format (both in the definition of the field
compactableand in the section about compact formats - adapt the property definition schema.
- If the new format is only used for new properties, a minor version change is sufficient.
- If a compactable field axis is changed from one compact format to another, also in this case a minor version change is sufficient (the client can discover this).
- If a non-compactable field is turned into compactable, to be discussed if this requires a major change (as a client might assume a non-compact format and misinterpret the data.
Note: we might need to issue a bug fix release of the property specification released in 1.2 since the compactable field was not described. A bug fix release might be sufficient since no 1.2-compatible clients have been implemented yet.