Skip to content

Conversation

@jhlegarreta
Copy link
Contributor

@jhlegarreta jhlegarreta commented Nov 21, 2025

Validate PET data objects' attributes at instantiation: ensures that the attributes are present and match the expected dimensionalities.

PET class attributes
Refactor the PET attributes so that the required (frame_time and uptake) and optional (frame_duration, midframe, total_duration) parameters are accepted by the constructor. Although the optional parameters can be computed from the required parameters, excluding them from the __init__ (using the init=False attrs option) would make such that when dumping a PET instance to an HDF5, further processing would be required to exclude those elements to allow reading the file, and they would need to be computed at every instantiation. Also, they may take user-provided values, so it is necessary for the constructor to allow them.

Although uptake can also be computed from the PET frame data, the rationale behind requiring it is similar to the one for the DWI class bzero: users will be able to compute the uptake using their preferred strategy and provide it to the constructor. For the from_nii function, if a callable is provided, it will be used to compute the value; otherwise a default strategy is used to compute it.

Validate and format attributes so that the computation of the relevant temporal and uptake attributes is done at a single place, i.e. when instantiating the PET object. Avoids potential inconsistencies.

Time-origin shift the frame_time values when formatting them.

Make the _compute_uptake_statistic public so that users can call it.

from_nii function:
Refactor the from_nii function to accept filenames instead of a mix of filenames (e.g. the PET image sequence and brainmask) and temporal and uptake attribute arrays. Honors the name of the function, increases consistency with the dMRI counterpart and allows to offer a uniform API. This allows to read the required and optional parameters from the provided files so that they are present when instantiating the PET object.

Use the get_data utils function in from_nii to handle automatically the data type when loading the PET data.

PET.load class method:
Remove the PET.load class method and rely on the data.__init__.load function:

  • If an HDF5 filename is provided, it is assumed that it hosts all necessary information, and the data module load function should take of loading all data.
  • If the provided arguments are NIfTI files plus other data files, the function will call the pet.PET.from_nii function.

Change the kwargs arguments to be able to identify the relevant keyword arguments that are now present in the from_nii function.

Change accordingly the PET.load(pet_file, json_file) call in the PET notebook and the test_pet_load test function.

Tests:
Refactor the PET data creation fixture in conftest.py to accept the required/optional arguments and to return the necessary data.

Refactor the tests accordingly and increase consistency with the dmri data module testing helper functions. Reduces cognitive load and maintenance burden.

Add additional object instantiation equality checks: check that objects intantiated through reading NIfTI files equal objects instantiated directly.

Check the PET dataset attributes systematically in round trip tests by collecting all named attributes that need to be tested.

Modify accordingly the PET model and integration tests.

Modify test parameterization values to values that make sense (i.e. are consistent with the way they are computed from the frame_time attribute).

Take advantage of the patch set to make other opinionated choices:

  • Prefer using the global setup_random_pet_data fixture over the local random_dataset fixture: it allows to control the parameters of the generated data and increases consistency with the practice adopted across the dMRI dataset tests. Remove the random_dataset fixture.
  • Prefer using assert np.allclose over np.testing.assert_array_equal for the sake of consistency

@jhlegarreta
Copy link
Contributor Author

Depends on PR #335.

@jhlegarreta
Copy link
Contributor Author

@mnoergaard While working on this I've realized that as things stand now on main, there is a risk that two instances of a PET object contain different data if instantiated directly (i.e. PET()) or from a NIfTI file (i.e. from_nii). So, I would like to have your confirm on the following:

  • To build a valid PET instance, the only required piece of information are the data sequence and frame_time, as uptake can be computed from the former, and midframe and total_duration can all be computed from the latter.

I think that answer will allow me to refactor this properly and avoid inconsistencies. Thanks.

@mnoergaard
Copy link
Contributor

@mnoergaard While working on this I've realized that as things stand now on main, there is a risk that two instances of a PET object contain different data if instantiated directly (i.e. PET()) or from a NIfTI file (i.e. from_nii). So, I would like to have your confirm on the following:

  • To build a valid PET instance, the only required piece of information are the data sequence and frame_time, as uptake can be computed from the former, and midframe and total_duration can all be computed from the latter.

I think that answer will allow me to refactor this properly and avoid inconsistencies. Thanks.

@jhlegarreta - that is correct! Thanks.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from 745e408 to bc7617e Compare November 22, 2025 20:41
@jhlegarreta
Copy link
Contributor Author

jhlegarreta commented Nov 22, 2025

@mnoergaard Please, see if the refactoring for the PET data class instantiation and from_nii function make sense. Before going in the suggested direction and fixing the remaining tests I would like to confirm this. Thanks.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 3 times, most recently from 803b020 to 1eae4b2 Compare November 23, 2025 16:00
@jhlegarreta
Copy link
Contributor Author

Pending to do an additional refactoring (in a separate commit) to stick to the convention adopted for dMRI data to split the nifreeze.data.pet.py contents across nifreeze.data.pet.base.py, nifreeze.data.pet.io.py and nifreeze.data.pet.utils.py modules after #336 (comment) is resolved and tests get fixed.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 2 times, most recently from ffe563c to 37ed54c Compare November 24, 2025 01:50
@jhlegarreta
Copy link
Contributor Author

Re #336 (comment) as of commit 37ed54c and the init=False, I dug into this a little more:

  • I see that to_filename dumps the PET class instance entirely to the HDF5 files, including its private attributes (i.e. those that can be computed from the data: midframe, total_duration, etc.) Thus, when a PET object is tried to be read and instantiated from the HDF5 contents, as it is given all these data, including the private attributes, it fails.
    Due to the way uptake is computed (using a callable that is not stored anywhere), and since it does not make sense to instantiate a PET object to have the private attributes computed to immediately after set them to other values, best would be to allow all of them to be present in the constructor, falling back to the default way of computing them if not present.
  • I see that that load class method is taking a JSON file with the frame duration and frame time start data. The frame time start looks like it should be an attribute of the class that can be given at instantiation, and be defaulted to
    frame_time_arr = np.array(self.frame_time, dtype=np.float32)
    frame_time_arr -= frame_time_arr[0]
    
    if not given.

Also, it should be possible to host and read midframe and total duration data from a JSON file (or any other file), much like it is done with the frame duration data, falling back to the default way of computing them if not present.

Refactoring things this way (i.e. allowing to provide all attributes to the PET class, and the from_nii function, and falling back to defaults if not given) would probably allow us to solve all failing tests that persist.

WDTY @mnoergaard?

Sorry for so many questions. Hopefully we will converge and the implementation across the multiple ways to read/write data will be consistent and robust after this.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 3 times, most recently from 6751a06 to bbe0ca0 Compare November 27, 2025 15:27
@jhlegarreta
Copy link
Contributor Author

jhlegarreta commented Nov 27, 2025

Comments:

So, we need to discuss:

  • We can make such that the PET object only contains midframe and total_duration data, which are the only data required for the current PET model to be instantiated:
    https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/estimator.py#L267

    This would mean that from_nii should take care of computing the attributes based exclusively the frame_time:
    https://github.com/jhlegarreta/nifreeze/blob/bbe0ca0e6c7edcc8f8f604fdacb75beb6ca112ac/src/nifreeze/data/pet.py#L345-L359

    The uptake value would no longer be present as an attribute. This would also make such that the HDF5 file would only contain midframe and total_duration data, and we would not be able to reproduce the original frame_time values.

  • Whether the lofo_split function is redundant and/or adds an unnecessary overhead by writing data to an HDF5 file and reading it back, as everything that is required for the masking

     mask = np.ones(self.dataobj.shape[-1], dtype=bool)
     mask[index] = False
    

    is the dataobj shape, which can be accessed through pet_dataset present in the estimator.

  • By exposing all attributes, a user could instantiate a PET object that does not contain correct information (e.g. has provided midframe values that are clearly wrong). So, probably, making the PET class expose all attributes was maybe not a good idea.

We probably want to override the base class to_nifti function so that we serialize the temporal data at least (uptake data depends on the above discussion) to a JSON file, so that we provide a uniform API (i.e. from_nii requires now temporal data read from a JSON file), and consistent across the tool.

@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch 7 times, most recently from 1353c92 to e7a03a1 Compare November 27, 2025 23:15
Validate PET data objects' attributes at instantiation: ensures that the
attributes are present and match the expected dimensionalities.

**PET class attributes**
Refactor the PET attributes so that the required (`frame_time` and
`uptake`) and optional (`frame_duration`, `midframe`, `total_duration`)
parameters are accepted by the constructor. Although the optional
parameters can be computed from the required parameters, excluding them
from the `__init__` (using the `init=False` `attrs` option) would make
such that when dumping a PET instance to an HDF5, further processing
would be required to exclude those elements to allow reading the file,
and they would need to be computed at every instantiation. Also, they
may take user-provided values, so it is necessary for the constructor to
allow them.

Although `uptake` can also be computed from the PET frame data, the
rationale behind requiring it is similar to the one for the DWI class
`bzero`: users will be able to compute the `uptake` using their
preferred strategy and provide it to the constructor. For the `from_nii`
function, if a callable is provided, it will be used to compute the
value; otherwise a default strategy is used to compute it.

Validate and format attributes so that the computation of the relevant
temporal and uptake attributes is done at a single place, i.e. when
instantiating the `PET` object. Avoids potential inconsistencies.

Time-origin shift the `frame_time` values when formatting them.

Make the `_compute_uptake_statistic` public so that users can call it.

**`from_nii`** function:
Refactor the `from_nii` function to accept filenames instead of a mix of
filenames (e.g. the PET image sequence and brainmask) and temporal and
uptake attribute arrays. Honors the name of the function, increases
consistency with the dMRI counterpart and allows to offer a uniform API.
This allows to read the required and optional parameters from the
provided files so that they are present when instantiating the PET
object.

Use the `get_data` utils function in `from_nii` to handle automatically
the data type when loading the PET data.

**`PET.load`** class method:
Remove the `PET.load` class method and rely on the `data.__init__.load`
function:
- If an HDF5 filename is provided, it is assumed that it hosts all
  necessary information, and the data module `load` function should take
  of loading all data.
- If the provided arguments are NIfTI files plus other data files, the
  function will call the `pet.PET.from_nii` function.

Change the `kwargs` arguments to be able to identify the relevant
keyword arguments that are now present in the `from_nii` function.

Change accordingly the `PET.load(pet_file, json_file)` call in the PET
notebook and the `test_pet_load` test function.

**Tests**:
Refactor the PET data creation fixture in `conftest.py` to accept the
required/optional arguments and to return the necessary data.

Refactor the tests accordingly and increase consistency with the `dmri`
data module testing helper functions. Reduces cognitive load and
maintenance burden.

Add additional object instantiation equality checks: check that objects
intantiated through reading NIfTI files equal objects instantiated
directly.

Check the PET dataset attributes systematically in round trip tests by
collecting all named attributes that need to be tested.

Modify accordingly the PET model and integration tests.

Modify test parameterization values to values that make sense (i.e. are
consistent with the way they are computed from the `frame_time`
attribute).

Take advantage of the patch set to make other opinionated choices:
- Prefer using the global `setup_random_pet_data` fixture over the local
  `random_dataset` fixture: it allows to control the parameters of the
  generated data and increases consistency with the practice adopted
  across the dMRI dataset tests. Remove the `random_dataset` fixture.
- Prefer using `assert np.allclose` over `np.testing.assert_array_equal`
  for the sake of consistency

**Dependencies**
Require `attrs>24.1.0` so that `attrs.Converter` can be used.
Documentation:
https://www.attrs.org/en/25.4.0/api.html#converters
@jhlegarreta jhlegarreta force-pushed the enh/validate-pet-data-attrs branch from e7a03a1 to 747b002 Compare November 29, 2025 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure the dataset mandatory attributes are present in data instantiation

2 participants