ENH: Validate PET data objects' attributes at instantiation #336

jhlegarreta · 2025-11-21T03:05:34Z

Validate PET data objects' attributes at instantiation: ensures that the attributes are present and match the expected dimensionalities.

PET class attributes
Refactor the PET attributes so that the required (frame_time and uptake) and optional (frame_duration, midframe, total_duration) parameters are accepted by the constructor. Although the optional parameters can be computed from the required parameters, excluding them from the __init__ (using the init=False attrs option) would make such that when dumping a PET instance to an HDF5, further processing would be required to exclude those elements to allow reading the file, and they would need to be computed at every instantiation. Also, they may take user-provided values, so it is necessary for the constructor to allow them.

Although uptake can also be computed from the PET frame data, the rationale behind requiring it is similar to the one for the DWI class bzero: users will be able to compute the uptake using their preferred strategy and provide it to the constructor. For the from_nii function, if a callable is provided, it will be used to compute the value; otherwise a default strategy is used to compute it.

Validate and format attributes so that the computation of the relevant temporal and uptake attributes is done at a single place, i.e. when instantiating the PET object. Avoids potential inconsistencies.

Time-origin shift the frame_time values when formatting them.

Make the _compute_uptake_statistic public so that users can call it.

from_nii function:
Refactor the from_nii function to accept filenames instead of a mix of filenames (e.g. the PET image sequence and brainmask) and temporal and uptake attribute arrays. Honors the name of the function, increases consistency with the dMRI counterpart and allows to offer a uniform API. This allows to read the required and optional parameters from the provided files so that they are present when instantiating the PET object.

Use the get_data utils function in from_nii to handle automatically the data type when loading the PET data.

PET.load class method:
Remove the PET.load class method and rely on the data.__init__.load function:

If an HDF5 filename is provided, it is assumed that it hosts all necessary information, and the data module load function should take of loading all data.
If the provided arguments are NIfTI files plus other data files, the function will call the pet.PET.from_nii function.

Change the kwargs arguments to be able to identify the relevant keyword arguments that are now present in the from_nii function.

Change accordingly the PET.load(pet_file, json_file) call in the PET notebook and the test_pet_load test function.

Tests:
Refactor the PET data creation fixture in conftest.py to accept the required/optional arguments and to return the necessary data.

Refactor the tests accordingly and increase consistency with the dmri data module testing helper functions. Reduces cognitive load and maintenance burden.

Add additional object instantiation equality checks: check that objects intantiated through reading NIfTI files equal objects instantiated directly.

Check the PET dataset attributes systematically in round trip tests by collecting all named attributes that need to be tested.

Modify accordingly the PET model and integration tests.

Modify test parameterization values to values that make sense (i.e. are consistent with the way they are computed from the frame_time attribute).

Take advantage of the patch set to make other opinionated choices:

Prefer using the global setup_random_pet_data fixture over the local random_dataset fixture: it allows to control the parameters of the generated data and increases consistency with the practice adopted across the dMRI dataset tests. Remove the random_dataset fixture.
Prefer using assert np.allclose over np.testing.assert_array_equal for the sake of consistency

jhlegarreta · 2025-11-21T03:07:45Z

Depends on PR #335.

jhlegarreta · 2025-11-22T17:07:31Z

@mnoergaard While working on this I've realized that as things stand now on main, there is a risk that two instances of a PET object contain different data if instantiated directly (i.e. PET()) or from a NIfTI file (i.e. from_nii). So, I would like to have your confirm on the following:

To build a valid PET instance, the only required piece of information are the data sequence and frame_time, as uptake can be computed from the former, and midframe and total_duration can all be computed from the latter.

I think that answer will allow me to refactor this properly and avoid inconsistencies. Thanks.

mnoergaard · 2025-11-22T17:18:53Z

@mnoergaard While working on this I've realized that as things stand now on main, there is a risk that two instances of a PET object contain different data if instantiated directly (i.e. PET()) or from a NIfTI file (i.e. from_nii). So, I would like to have your confirm on the following:

To build a valid PET instance, the only required piece of information are the data sequence and frame_time, as uptake can be computed from the former, and midframe and total_duration can all be computed from the latter.

I think that answer will allow me to refactor this properly and avoid inconsistencies. Thanks.

@jhlegarreta - that is correct! Thanks.

jhlegarreta · 2025-11-22T20:42:04Z

@mnoergaard Please, see if the refactoring for the PET data class instantiation and from_nii function make sense. Before going in the suggested direction and fixing the remaining tests I would like to confirm this. Thanks.

jhlegarreta · 2025-11-23T16:03:59Z

Pending to do an additional refactoring (in a separate commit) to stick to the convention adopted for dMRI data to split the nifreeze.data.pet.py contents across nifreeze.data.pet.base.py, nifreeze.data.pet.io.py and nifreeze.data.pet.utils.py modules after #336 (comment) is resolved and tests get fixed.

jhlegarreta · 2025-11-25T04:18:06Z

Re #336 (comment) as of commit 37ed54c and the init=False, I dug into this a little more:

I see that to_filename dumps the PET class instance entirely to the HDF5 files, including its private attributes (i.e. those that can be computed from the data: midframe, total_duration, etc.) Thus, when a PET object is tried to be read and instantiated from the HDF5 contents, as it is given all these data, including the private attributes, it fails.
Due to the way uptake is computed (using a callable that is not stored anywhere), and since it does not make sense to instantiate a PET object to have the private attributes computed to immediately after set them to other values, best would be to allow all of them to be present in the constructor, falling back to the default way of computing them if not present.
I see that that load class method is taking a JSON file with the frame duration and frame time start data. The frame time start looks like it should be an attribute of the class that can be given at instantiation, and be defaulted to
```
frame_time_arr = np.array(self.frame_time, dtype=np.float32)
frame_time_arr -= frame_time_arr[0]
```
if not given.

Also, it should be possible to host and read midframe and total duration data from a JSON file (or any other file), much like it is done with the frame duration data, falling back to the default way of computing them if not present.

Refactoring things this way (i.e. allowing to provide all attributes to the PET class, and the from_nii function, and falling back to defaults if not given) would probably allow us to solve all failing tests that persist.

WDTY @mnoergaard?

Sorry for so many questions. Hopefully we will converge and the implementation across the multiple ways to read/write data will be consistent and robust after this.

jhlegarreta · 2025-11-27T15:45:11Z

Comments:

There is a failing test in the estimator. When the PET.lofo_split function is called it returns the midframe data:
https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/estimator.py#L248
https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/estimator.py#L256
However, the PET instantiation is provided all parameters, which are not masked, and thus the dimensionality validation for frame_time fails as the index frame is removed by the LOO strategy. frame_time is now a required argument. Omitting the optional argument would make such that the PET instance would automatically compute them based on the masked data, which would yield wrong information. Making lofo_split return all temporal/uptake data so that a correct PET object can be instantiated is another option.

The typechecks are also failing because midframe is allowed to be None, but it is the feature required by the model. e.g.

nifreeze/src/nifreeze/data/pet.py

Lines 56 to 57 in 62e5e43

    
           def _getextra(self, idx: int | slice | tuple | np.ndarray) -> tuple[np.ndarray]: 
        
               return (self.midframe[idx],)

errors.

So, we need to discuss:

We can make such that the PET object only contains midframe and total_duration data, which are the only data required for the current PET model to be instantiated:
https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/estimator.py#L267

This would mean that from_nii should take care of computing the attributes based exclusively the frame_time:
https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/data/pet.py#L345-L359

The uptake value would no longer be present as an attribute. This would also make such that the HDF5 file would only contain midframe and total_duration data, and we would not be able to reproduce the original frame_time values.
Whether the lofo_split function is redundant and/or adds an unnecessary overhead by writing data to an HDF5 file and reading it back, as everything that is required for the masking
```
 mask = np.ones(self.dataobj.shape[-1], dtype=bool)
 mask[index] = False
```
is the dataobj shape, which can be accessed through pet_dataset present in the estimator.
By exposing all attributes, a user could instantiate a PET object that does not contain correct information (e.g. has provided midframe values that are clearly wrong). So, probably, making the PET class expose all attributes was maybe not a good idea.

We probably want to override the base class to_nifti function so that we serialize the temporal data at least (uptake data depends on the above discussion) to a JSON file, so that we provide a uniform API (i.e. from_nii requires now temporal data read from a JSON file), and consistent across the tool.

Validate PET data objects' attributes at instantiation: ensures that the attributes are present and match the expected dimensionalities. **PET class attributes** Refactor the PET attributes so that the required (`frame_time` and `uptake`) and optional (`frame_duration`, `midframe`, `total_duration`) parameters are accepted by the constructor. Although the optional parameters can be computed from the required parameters, excluding them from the `__init__` (using the `init=False` `attrs` option) would make such that when dumping a PET instance to an HDF5, further processing would be required to exclude those elements to allow reading the file, and they would need to be computed at every instantiation. Also, they may take user-provided values, so it is necessary for the constructor to allow them. Although `uptake` can also be computed from the PET frame data, the rationale behind requiring it is similar to the one for the DWI class `bzero`: users will be able to compute the `uptake` using their preferred strategy and provide it to the constructor. For the `from_nii` function, if a callable is provided, it will be used to compute the value; otherwise a default strategy is used to compute it. Validate and format attributes so that the computation of the relevant temporal and uptake attributes is done at a single place, i.e. when instantiating the `PET` object. Avoids potential inconsistencies. Time-origin shift the `frame_time` values when formatting them. Make the `_compute_uptake_statistic` public so that users can call it. **`from_nii`** function: Refactor the `from_nii` function to accept filenames instead of a mix of filenames (e.g. the PET image sequence and brainmask) and temporal and uptake attribute arrays. Honors the name of the function, increases consistency with the dMRI counterpart and allows to offer a uniform API. This allows to read the required and optional parameters from the provided files so that they are present when instantiating the PET object. Use the `get_data` utils function in `from_nii` to handle automatically the data type when loading the PET data. **`PET.load`** class method: Remove the `PET.load` class method and rely on the `data.__init__.load` function: - If an HDF5 filename is provided, it is assumed that it hosts all necessary information, and the data module `load` function should take of loading all data. - If the provided arguments are NIfTI files plus other data files, the function will call the `pet.PET.from_nii` function. Change the `kwargs` arguments to be able to identify the relevant keyword arguments that are now present in the `from_nii` function. Change accordingly the `PET.load(pet_file, json_file)` call in the PET notebook and the `test_pet_load` test function. **Tests**: Refactor the PET data creation fixture in `conftest.py` to accept the required/optional arguments and to return the necessary data. Refactor the tests accordingly and increase consistency with the `dmri` data module testing helper functions. Reduces cognitive load and maintenance burden. Add additional object instantiation equality checks: check that objects intantiated through reading NIfTI files equal objects instantiated directly. Check the PET dataset attributes systematically in round trip tests by collecting all named attributes that need to be tested. Modify accordingly the PET model and integration tests. Modify test parameterization values to values that make sense (i.e. are consistent with the way they are computed from the `frame_time` attribute). Take advantage of the patch set to make other opinionated choices: - Prefer using the global `setup_random_pet_data` fixture over the local `random_dataset` fixture: it allows to control the parameters of the generated data and increases consistency with the practice adopted across the dMRI dataset tests. Remove the `random_dataset` fixture. - Prefer using `assert np.allclose` over `np.testing.assert_array_equal` for the sake of consistency **Dependencies** Require `attrs>24.1.0` so that `attrs.Converter` can be used. Documentation: https://www.attrs.org/en/25.4.0/api.html#converters

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from e094509 to a548cbc Compare November 22, 2025 01:08

jhlegarreta mentioned this pull request Nov 22, 2025

Ensure the dataset mandatory attributes are present in data instantiation #302

Open

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch from a548cbc to 96e916b Compare November 22, 2025 16:47

jhlegarreta linked an issue Nov 22, 2025 that may be closed by this pull request

Ensure the dataset mandatory attributes are present in data instantiation #302

Open

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from 745e408 to bc7617e Compare November 22, 2025 20:41

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 3 times, most recently from 803b020 to 1eae4b2 Compare November 23, 2025 16:00

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from ffe563c to 37ed54c Compare November 24, 2025 01:50

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 3 times, most recently from 6751a06 to bbe0ca0 Compare November 27, 2025 15:27

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 7 times, most recently from 1353c92 to e7a03a1 Compare November 27, 2025 23:15

jhlegarreta force-pushed the enh/validate-pet-data-attrs branch from e7a03a1 to 747b002 Compare November 29, 2025 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Validate PET data objects' attributes at instantiation #336

ENH: Validate PET data objects' attributes at instantiation #336

jhlegarreta commented Nov 21, 2025 •

edited

Loading

Uh oh!

jhlegarreta commented Nov 21, 2025

Uh oh!

jhlegarreta commented Nov 22, 2025

Uh oh!

mnoergaard commented Nov 22, 2025

Uh oh!

jhlegarreta commented Nov 22, 2025 •

edited

Loading

Uh oh!

jhlegarreta commented Nov 23, 2025

Uh oh!

jhlegarreta commented Nov 25, 2025

Uh oh!

jhlegarreta commented Nov 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ENH: Validate PET data objects' attributes at instantiation #336

Are you sure you want to change the base?

ENH: Validate PET data objects' attributes at instantiation #336

Conversation

jhlegarreta commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhlegarreta commented Nov 21, 2025

Uh oh!

jhlegarreta commented Nov 22, 2025

Uh oh!

mnoergaard commented Nov 22, 2025

Uh oh!

jhlegarreta commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhlegarreta commented Nov 23, 2025

Uh oh!

jhlegarreta commented Nov 25, 2025

Uh oh!

jhlegarreta commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhlegarreta commented Nov 21, 2025 •

edited

Loading

jhlegarreta commented Nov 22, 2025 •

edited

Loading

jhlegarreta commented Nov 27, 2025 •

edited

Loading